# **MMIXware**

A RISC Computer for the Third Millennium



## Lecture Notes in Computer Science

1750

Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

#### **Editorial Board**

David Hutchison

Lancaster University, UK

Takeo Kanade

Carnegie Mellon University, Pittsburgh, PA, USA

Josef Kittler

University of Surrey, Guildford, UK

Jon M. Kleinberg

Cornell University, Ithaca, NY, USA

Alfred Kobsa

University of California, Irvine, CA, USA

Friedemann Mattern

ETH Zurich, Switzerland

John C. Mitchell

Stanford University, CA, USA

Moni Naor

Weizmann Institute of Science, Rehovot, Israel

Oscar Nierstrasz

University of Bern, Switzerland

C. Pandu Rangan

Indian Institute of Technology, Madras, India

Bernhard Steffen

TU Dortmund University, Germany

Demetri Terzopoulos

University of California, Los Angeles, CA, USA

Doug Tygar

University of California, Berkeley, CA, USA

Gerhard Weikum

Max Planck Institute for Informatics, Saarbruecken, Germany

### Donald E. Knuth

# **MMIX**ware

A RISC Computer for the Third Millennium



#### Author

Donald E. Knuth Computer Science Department Stanford University Stanford, CA 94305-9045, USA

ISSN 0302-9743 e-ISSN 1611-3349 ISBN 978-3-540-66938-8 e-ISBN 978-3-540-46611-6 DOI 10.1007/3-540-46611-8 Springer Heidelberg New York Dordrecht London

#### LNCS Sublibrary: SL 2 – Programming and Software Engineering

The following copyright notice is included in all files of the MMIXware package:

#### © 1999 Donald E. Knuth

This file may be freely copied and distributed, provided that no changes whatsoever are made. All users are asked to help keep the MMIXware files consistent and "uncorrupted," identical everywhere in the world. Changes are permissible only if the modified file is given a new name, different from the names of existing files in the MMIXware package, and only if the modified file is clearly identified as not being part of that package. (The CWEB system has a "change file" facility by which users can easily make minor alterations without modifying the master source files in any way. Everybody is supposed to use change files instead of changing the files.) The author has tried his best to produce correct and useful programs, in order to help promote computer science research, but no warranty of any kind should be assumed.

Usage of those files in derived works is otherwise unrestricted.

All portions of the present book that are not distributed as part of the MMIXware files are copyright © 1999, corrected printing 2014 by Springer-Verlag. All rights for those portions (including the special indexes) are reserved.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

MMIX is a computer intended to illustrate machine-level aspects of programming. In my books *The Art of Computer Programming*, it replaces MIX, the 1960s-style machine that formerly played such a role. MMIX's so-called RISC ("Reduced Instruction Set Computer") architecture is much better able to represent the computers being built at the turn of the millennium.

I strove to design MMIX so that its machine language would be simple, elegant, and easy to learn. At the same time I was careful to include all of the complexities needed to achieve high performance in practice, so that MMIX could in principle be built and even perhaps be competitive with some of the fastest general-purpose computers in the marketplace. I hope that MMIX will therefore prove to be a useful vehicle for people who are studying how to improve compilers and operating systems, and that other authors will like MMIX well enough to make use of it in their own textbooks. My goal in this work is to provide a clean, complete, and well-documented "machine-independent machine" that people all over the world will be able to use as a testbed for long-term research projects of lasting value, even as real computers continue to change rapidly.

This book is a collection of programs that make MMIX a virtual reality. One of the programs is an assembler, MMIXAL, which converts MMIX symbolic files to MMIX object files. There also are two simulators, which execute the programs in given object files. The first simulator, called MMIX-SIM or simply MMIX, executes a program one instruction at a time and allows convenient debugging. The second simulator, MMMIX, simulates a high-performance pipeline in which many aspects of the computation are overlapped in time. MMMIX is in fact a highly configurable "meta-simulator," capable of simulating an enormous variety of different kinds of pipelines with any number of functional units and with many possible strategies for caching, virtual address translation, branch prediction, super-scalar instruction issue, etc., etc.

The programs in this book are somewhat primitive, because they all are based on a simple terminal interface: Users type commands and the computer types out a reply. Still, these programs are adequate to provide a basis for future developments. I'm hoping that at least one reader of this book will discover how much fun MMIX programming can be and will be motivated to create a nice graphical interface, so that other people will more easily be able to join in the fun. I don't have the time or talent to construct a good GUI myself, but I've tried to write the programs in such a way that modifications and enhancements will be easy to make.

The latest versions of all these programs can be downloaded from MMIX's home page

http://mmix.cs.hm.edu/

in a file named mmix-YYYYMMDD.tar.gz. The programs are copyrighted, but anyone can use them without charge. Furthermore I explicitly allow anybody to copy and modify the programs in any way they like, provided only that the computer files are given different names whenever they have been changed. Only my designated successors in Munich are allowed to make a correction or addition to the copyrighted file mmixal.w, for example, unless the corrected file is identified by some other name (possibly 'turbo-mmixal.w' or 'mmixal++.w', etc.).

*README* vi

The programs are all written in CWEB, a language that combines C with TEX in such a way that standard preprocessors can easily convert mmixal.w into a compilable file mmixal.c or a documentation file mmixal.tex. CWEB also includes a "change file" mechanism by which people can easily customize a master source file like mmixal.w without changing the master file in any way. (See

#### http://www-cs-faculty.stanford.edu/~knuth/cweb.html

for complete information about CWEB, including installation instructions for the related software.) Readers of the present book who are unfamiliar with CWEB might want to refer to the notes on "How to read CWEB programs" that appear on pages 70–73 of my book *The Stanford GraphBase* (New York: ACM Press, 1993), but the general ideas are almost self-explanatory so I decided not to reprint those notes here.

During the next several years, as I write Volume 4 of *The Art of Computer Programming*, I plan to prepare updates to Volumes 1–3 whenever Volume 4 needs to refer to new material that belongs more properly in earlier volumes. These updates, called "fascicles," will be available on the Internet via

#### http://www-cs-faculty.stanford.edu/~knuth/taocp.html

and they will also be published in hardcopy form. The first such fascicle is already finished and available for downloading; it is a tutorial introduction to MMIX and the MMIX assembly language. Everybody who is seriously interested in MMIX should read that First Fascicle, preferably before reading the programs in the present book.

I've tried to make the MMIXware programs interesting to read as well as useful. Indeed, the MMIX-PIPE program, which is the chief component of the MMMIX metasimulator, is one of the most instructive programs I've ever had the pleasure of writing. But I don't expect a great number of people to study every part of this book closely, or even to study every part of MMIX-PIPE. The main purpose of this book is to provide a complete documentation of the MMIX computer and its assembly language. Many details about MMIX were too "picky" or too system-oriented to be appropriate for the First Fascicle, but every detail about MMIX can be found in the present book.

After the MMIXware programs have been installed on a UNIX-like system, they are typically used as follows. First a user program is written in assembly language and put into a file, say foo.mms. (The suffix .mms stands for "MMIX symbolic.") Then the command

#### mmixal foo.mms

will translate it into an object file, foo.mmo. Alternatively, a command such as

#### mmixal -l foo.lst foo.mms

could be used; this would produce a listing file, foo.lst, in addition to foo.mmo. The listing file, when printed, would show the contents of foo.mms together with the assembled machine language instructions.

vii README

Once an object file like foo.mmo exists, it can be run on the simple simulator by issuing a command such as

mmix foo

(or mmix foo.mmo). Many options are also possible; for example,

mmix -s foo

will print running time statistics when the program ends;

mmix -P foo

will print a profile that shows exactly how often each instruction was executed;

mmix -v foo

will give "verbose" details about everything the simulator did;

mmix -t2 foo

will trace each instruction the first two times it is performed; and so on. Also

mmix -i foo

will run the simulator in interactive mode, obeying various online commands by which the user can watch exactly what is happening when key parts of the program are reached. The command

mmix foo bar

will run the simulator as if MMIX itself were running the command 'foo bar' with a rudimentary operating system; any number of command-line arguments can follow the name of the program being simulated.

The MMMIX meta-simulator can also be applied to the same program, although a bit more preparation is necessary. First the command

mmix -Dfoo.mmb foo bar

will dump out a binary file foo.mmb containing the information needed to load 'foo bar' into MMIX's memory. Then a command like

mmmix plain.mmconfig foo.mmb

will invoke the meta-simulator with a "plain" pipeline configuration. The meta-simulator always runs interactively, using the prompt 'mmmix>' when it wants instructions about what to do next. Users can type '?' in response to this prompt if they want to be reminded about what the simulator can do. Typical responses are 'vff' (run verbosely); 'v0' (run quietly); 'p' (show the pipeline); 'g255' (show global register 255); 'D' (show the D-cache); 'b200' (pause when location #200 is fetched); '1000' (run 1000 cycles); etc. Some familiarity with MMIX-PIPE is necessary to understand the meta-simulator's reports of its activity, but users of mmmix are assumed

README viii

to be able to extract high-level information from a mass of low-level details. (This talent, after all, is the hallmark of a computer scientist.)

The programs in this book appear in alphabetical order:

MMIX explains everything about the MMIX architecture.

MMIX-ARITH contains subroutines for 64-bit fixed and floating point arithmetic, using only 32-bit fixed point arithmetic.

MMIX-CONFIG processes configuration files for MMMIX.

MMIX-IO contains subroutines for the primitive input/output operations of a rudimentary operating system.

MMIX-MEM handles memory references of MMMIX in special cases associated with memory-mapped input/output.

MMIX-PIPE does the hard work of pipeline simulation.

MMIX-SIM is the program for the non-pipelined simulator.

MMIXAL is the assembly program.

MMMIX is the driver program for the meta-simulator.

MMOTYPE is a utility program that translates an MMIX object file into human-readable form.

The first of these, MMIX, is not actually a program, although it has been formatted as a CWEB document; it is a complete definition of MMIX, including the details of features that are used only by the operating system. It should be read first, but the other programs can be read in any order. (Actually MMIXAL or MMIX-SIM should probably be read next after MMIX, and MMIX-PIPE last. The program MMIX-SIM is the line-at-a-time simulator that is known simply as mmix after it has been compiled.)

Mini-indexes have been provided on each right-hand page of this book so that the programs can be read essentially as hypertext. Every identifier that is used on a two-page spread but defined on some other page is listed in the mini-index. For example, a mini-index entry such as 'oplus: octa (), MMIX-ARITH §5' means that the identifier oplus denotes a function defined in section §5 of the MMIX-ARITH module, returning a value of type octa. A master index to all uses of all identifiers appears at the end of this book.

Happy hacking!

Donald E. Knuth Cambridge, Massachusetts 17 October 1999

# CONTENTS

| v                   | README       | (a preface)             |
|---------------------|--------------|-------------------------|
| 1                   | WELCOME      | (an explanation)        |
| 2                   | MMIX         | (a definition)          |
| 62                  | MMIX-ARITH   | (a library)             |
| 110                 | MMIX-CONFIG  | (a part of MMMIX)       |
| 138                 | MMIX-IO      | (a library)             |
| 148                 | MMIX-MEM     | (a triviality)          |
| 150                 | MMIX-PIPE    | (a part of MMMIX)       |
| 332                 | MMIX-SIM     | (a simulator)           |
| 422                 | MMIXAL       | (an assembler)          |
| 494                 | MMMIX        | (a meta-simulator)      |
| $510 \ldots \ldots$ | MMOTYPE      | (a utility program)     |
| 524                 | Master Index | (a table of references) |



Welcome to the revised printing of MMIXware, which incorporates hundreds of detailed changes suggested by readers of the original 1999 printing. I thank the staff at Springer for providing this opportunity to make an archival, corrected version of the entire text.

The current printing documents Version 1 of MMIX, and it corresponds to the programs of mmix-20131017.tgz. Version 1 is permanently frozen, and "bug-free by definition." All future developments can be accessed via the MMIX home page cited above in the README section (the "frontmatter").

Each of the following ten chapters begins on a left-hand page, and represents a component of the official CWEB source files for MMIX Version 1.

DEK, 17 October 2013

1. Introduction to MMIX. Thirty-eight years have passed since the MIX computer was designed, and computer architecture has been converging during those years towards a rather different style of machine. Therefore it is time to replace MIX with a new computer that contains even less saturated fat than its predecessor.

Exercise 1.3.1–25 in the third edition of Fundamental Algorithms speaks of an extended MIX called MixMaster, which is upward compatible with the old version. But MixMaster itself is hopelessly obsolete; although it allows for several gigabytes of memory, we can't even use it with ASCII code to get lowercase letters. And ouch, the standard subroutine calling convention of MIX is irrevocably based on self-modifying code! Decimal arithmetic and self-modifying code were popular in 1962, but they sure have disappeared quickly as machines have gotten bigger and faster. A completely new design is called for, based on the principles of RISC architecture as expounded in Computer Architecture by Hennessy and Patterson (Morgan Kaufmann, 1996).

So here is MMIX, a computer that will totally replace MIX in the "ultimate" editions of *The Art of Computer Programming*, Volumes 1–3, and in the first editions of the remaining volumes. I must confess that I can hardly wait to own a computer like this.

How do you pronounce MMIX? I've been saying "em-mix" to myself, because the first 'M' represents a new millennium. Therefore I use the article "an" instead of "a" before the name MMIX in English phrases like "an MMIX simulator."

Incidentally, the *Dictionary of American Regional English* **3** (1996) lists "mommix" as a common dialect word used both as a noun and a verb; to mommix something means to botch it, to bollix it. Only time will tell whether I have mommixed the definition of MMIX.

2. The original MIX computer could be operated without an operating system; you could bootstrap it with punched cards or paper tape and do everything yourself. But nowadays such power is no longer in the hands of ordinary users. The MMIX hardware, like all other computing machines made today, relies on an operating system to get jobs started in their own address spaces and to provide I/O capabilities.

Whenever anybody has asked if I will be writing about operating systems, my reply has always been "Nix." Therefore the name of MMIX's operating system, NNIX, will come as no surprise. From time to time I will necessarily have to refer to things that NNIX does for its users, but I am unable to build NNIX myself. Life is too short. It would be wonderful if some expert in operating system design became inspired to write a book that explains exactly how to construct a nice, clean NNIX kernel for an MMIX chip.

**3.** I am deeply grateful to the many people who have helped me shape the behavior of MMIX. In particular, John Hennessy and (especially) Dick Sites have made significant contributions.

4. A programmer's introduction to MMIX appears in "Volume 1, Fascicle 1," a booklet containing tutorial material that will ultimately appear in the fourth edition of The Art of Computer Programming. The description in the following sections is rather different, because we are concerned about a complete implementation, including all of the features used by the operating system and invisible to normal programs. Here it is important to emphasize exceptional cases that were glossed over in the tutorial, and to consider nitpicky details about things that might go wrong.

MMIX: MMIX BASICS 4

**5. MMIX** basics. MMIX is a 64-bit RISC machine with at least 256 general-purpose registers and a 64-bit address space. Every instruction is four bytes long and has the form

| OP X | Y | Z |  |
|------|---|---|--|
|------|---|---|--|

The 256 possible OP codes fall into a dozen or so easily remembered categories; an instruction usually means, "Set register X to the result of Y OP Z." For example,

| 32 | 1 | 2 | 3 |
|----|---|---|---|

sets register 1 to the sum of registers 2 and 3. A few instructions combine the Y and Z bytes into a 16-bit YZ field; two of the jump instructions use a 24-bit XYZ field. But the three bytes X, Y, Z usually have three-pronged significance independent of each other.

Instructions are usually represented in a symbolic form corresponding to the MMIX assembly language, in which each operation code has a mnemonic name. For example, operation 32 is ADD, and the instruction above might be written 'ADD \$1,\$2,\$3'; a dollar sign '\$' symbolizes a register number. In general, the instruction ADD \$X,\$Y,\$Z is the operation of setting X = Y + Z. An assembly language instruction with two commas has three operand fields X, Y, Z; an instruction with one comma has two operand fields X, Y, an instruction with no operands has X = Y = Z = 0.

Most instructions have two forms, one in which the Z field stands for register \$Z, and one in which Z is an unsigned "immediate" constant. Thus, for example, the command 'ADD X,Y,Z has a counterpart 'ADD X,Y,Z, which sets X = Y+Z. Immediate constants are always nonnegative. In the descriptions below we will introduce such pairs of instructions by writing just 'ADD X,Y,Z instead of naming both cases explicitly.

The operation code for ADD \$X,\$Y,\$Z is 32, but the operation code for ADD \$X,\$Y,Z is 33. The MMIX assembler chooses the correct code by noting whether the third argument is a register number or not.

Register numbers and constants can be given symbolic names; for example, the assembly language instruction 'x IS \$1' makes x an abbreviation for register number 1. Similarly, 'FIVE IS 5' makes FIVE an abbreviation for the constant 5. After these abbreviations have been specified, the instruction ADD x,x,FIVE increases \$1 by 5, using opcode 33, while the instruction ADD x,x,x doubles \$1 using opcode 32. Symbolic names that stand for register numbers conventionally begin with a lowercase letter, while names that stand for constants conventionally begin with an uppercase letter. This convention is not actually enforced by the assembler, but it tends to reduce a programmer's confusion.

**6.** A *nybble* is a 4-bit quantity, often used to denote a decimal or hexadecimal digit. A *byte* is an 8-bit quantity, often used to denote an alphanumeric character in ASCII code. The Unicode standard extends ASCII to essentially all the world's languages by using 16-bit-wide characters called *wydes*. (Weight watchers know that two nybbles make one byte, but two bytes make one wyde.) In the discussion below we use the

5 MMIX: MMIX BASICS

term *tetrabyte* or "tetra" for a 4-byte quantity, and the similar term *octabyte* or "octa" for an 8-byte quantity. Thus, a tetra is two wydes, an octa is two tetras; an octabyte has 64 bits. Each MMIX register can be thought of as containing one octabyte, or two tetras, or four wydes, or eight bytes, or sixteen nybbles.

When bytes, wydes, tetras, and octas represent numbers they are said to be either signed or unsigned. An unsigned byte is a number between 0 and  $2^8-1=255$  inclusive; an unsigned wyde lies, similarly, between 0 and  $2^{16}-1=65535$ ; an unsigned tetra lies between 0 and  $2^{32}-1=4,294,967,295$ ; an unsigned octa lies between 0 and  $2^{64}-1=18,446,744,073,709,551,615$ . Their signed counterparts use the conventions of two's complement notation, by subtracting respectively  $2^8$ ,  $2^{16}$ ,  $2^{32}$ , or  $2^{64}$  times the most significant bit. Thus, the unsigned bytes 128 through 255 are regarded as the numbers -128 through -1 when they are evaluated as signed bytes; a signed byte therefore lies between -128 and +127, inclusive. A signed wyde is a number between -32768 and +32767; a signed tetra lies between -2,147,483,648 and +2,147,483,647; a signed octa lies between -9,223,372,036,854,775,808 and +9,223,372,036,854,775,807.

The virtual memory of MMIX is an array M of  $2^{64}$  bytes. If k is any unsigned octabyte, M[k] is a 1-byte quantity. MMIX machines do not actually have such vast memories, but programmers can act as if  $2^{64}$  bytes are indeed present, because MMIX provides address translation mechanisms by which an operating system can maintain this illusion.

We use the notation  $M_{2^t}[k]$  to stand for a number consisting of  $2^t$  consecutive bytes starting at location  $k \wedge (2^{64} - 2^t)$ . (The notation  $k \wedge (2^{64} - 2^t)$  means that the least significant t bits of k are set to 0, and only the least 64 bits of the resulting address are retained. Similarly, the notation  $k \vee (2^t - 1)$  means that the least significant t bits of k are set to 1.) All accesses to  $2^t$ -byte quantities by MMIX are *aligned*, in the sense that the first byte is a multiple of  $2^t$ .

Addressing is always "big-endian." In other words, the most significant (leftmost) byte of  $M_{2^t}[k]$  is  $M_1[k \wedge (2^{64} - 2^t)]$  and the least significant (rightmost) byte is  $M_1[k \vee (2^t - 1)]$ . We use the notation  $s(M_{2^t}[k])$  when we want to regard this  $2^t$ -byte number as a *signed* integer. Formally speaking, if  $l = 2^t$ ,

$$s(M_l[k]) = (M_1[k \wedge (-l)] M_1[k \wedge (-l) + 1] \dots M_1[k \vee (l-1)])_{256} - 2^{8l}[M_1[k \wedge (-l)] \geq 128].$$

- 7. Loading and storing. Several instructions can be used to get information from memory into registers. For example, the "load tetra unsigned" instruction LDTU \$1,\$4,\$5 puts the four bytes  $M_4[\$4+\$5]$  into register 1 as an unsigned integer; the most significant four bytes of register 1 are set to zero. The similar instruction LDT \$1,\$4,\$5, "load tetra," sets \$1 to the *signed* integer  $s(M_4[\$4+\$5])$ . (Instructions generally treat numbers as signed unless the operation code specifically calls them unsigned.) In the signed case, the most significant four bytes of the register will be copies of the most significant bit of the tetrabyte loaded; thus they will be all 0s or all 1s, depending on whether the number is  $\geq 0$  or < 0.
- LDB \$X,\$Y,\$Z|Z 'load byte'.

Byte s(M[\$Y + \$Z]) or s(M[\$Y + Z]) is loaded into register X as a signed number between -128 and +127, inclusive.

- LDBU X,Y,Z|Z 'load byte unsigned'. Byte M[Y + Z] or M[Y + Z] is loaded into register X as an unsigned number between 0 and 255, inclusive.
- LDW \$X,\$Y,\$Z|Z 'load wyde'.

Bytes  $s(M_2[\$Y + \$Z])$  or  $s(M_2[\$Y + Z])$  are loaded into register X as a signed number between -32768 and +32767, inclusive. As mentioned above, our notation  $M_2[k]$  implies that the least significant bit of the address \$Y + \$Z or \$Y + Z is ignored and assumed to be 0.

- LDWU \$X,\$Y,\$Z|Z 'load wyde unsigned'. Bytes  $M_2[\$Y + \$Z]$  or  $M_2[\$Y + Z]$  are loaded into register X as an unsigned number between 0 and 65535, inclusive.
- LDT \$X,\$Y,\$Z|Z 'load tetra'.

Bytes  $s(M_4[\$Y + \$Z])$  or  $s(M_4[\$Y + Z])$  are loaded into register X as a signed number between -2,147,483,648 and +2,147,483,647, inclusive. As mentioned above, our notation  $M_4[k]$  implies that the two least significant bits of the address \$Y + \$Z or \$Y + Z are ignored and assumed to be 0.

• LDTU \$X,\$Y,\$Z|Z 'load tetra unsigned'.

Bytes  $M_4[\$Y + \$Z]$  or  $M_4[\$Y + Z]$  are loaded into register X as an unsigned number between 0 and 4,294,967,296, inclusive.

• LDO \$X,\$Y,\$Z|Z 'load octa'.

Bytes  $M_8[\$Y + \$Z]$  or  $M_8[\$Y + Z]$  are loaded into register X. As mentioned above, our notation  $M_8[k]$  implies that the three least significant bits of the address \$Y + \$Z or \$Y + Z are ignored and assumed to be 0.

• LDOU \$X,\$Y,\$Z|Z 'load octa unsigned'.

Bytes  $M_8[\$Y + \$Z]$  or  $M_8[\$Y + Z]$  are loaded into register X. There is in fact no difference between the behavior of LDOU and LDO, since an octabyte can be regarded as either signed or unsigned. LDOU is included in MMIX just for completeness and consistency, in spite of the fact that a foolish consistency is the hobgoblin of little minds. (Niklaus Wirth made a strong plea for such consistency in his early critique of System/360; see *JACM* 15 (1967), 37–74.)

• LDHT \$X,\$Y,\$Z|Z 'load high tetra'.

Bytes  $M_4[\$Y + \$Z]$  or  $M_4[\$Y + Z]$  are loaded into the most significant half of register X, and the least significant half is cleared to zero. (One use of "high tetra arithmetic"

is to detect overflow easily when tetrabytes are added or subtracted.)

• LDA \$X,\$Y,\$Z|Z 'load address'.

The address Y + Z or Y + Z is loaded into register X. This instruction is simply another name for the ADDU instruction discussed below; it can be used when the programmer is thinking of memory addresses instead of numbers. The MMIX assembler converts LDA into the same OP-code as ADDU.

- 8. Another family of instructions goes the other way, storing registers into memory. For example, the "store octa immediate" command STO \$3,\$2,17 puts the current contents of register 3 into  $M_8[\$2+17]$ .
- STB \$X,\$Y,\$Z|Z 'store byte'.

The least significant byte of register X is stored into byte M[\$Y + \$Z] or M[\$Y + Z]. An integer overflow exception occurs if \$X is not between -128 and +127. (We will discuss overflow and other kinds of exceptions later.)

• STBU \$X,\$Y,\$Z|Z 'store byte unsigned'.

The least significant byte of register X is stored into byte M[\$Y + \$Z] or M[\$Y + Z]. STBU instructions are the same as STB instructions, except that no test for overflow is made.

• STW \$X,\$Y,\$Z|Z 'store wyde'.

The two least significant bytes of register X are stored into bytes  $M_2[\$Y + \$Z]$  or  $M_2[\$Y + Z]$ . An integer overflow exception occurs if \$X is not between -32768 and +32767.

• STWU \$X,\$Y,\$Z|Z 'store wyde unsigned'.

The two least significant bytes of register X are stored into bytes  $M_2[\$Y + \$Z]$  or  $M_2[\$Y + Z]$ . STWU instructions are the same as STW instructions, except that no test for overflow is made.

• STT \$X,\$Y,\$Z|Z 'store tetra'.

The four least significant bytes of register X are stored into bytes  $M_4[\$Y + \$Z]$  or  $M_4[\$Y + Z]$ . An integer overflow exception occurs if \$X is not between -2,147,483,648 and +2,147,483,647.

• STTU \$X,\$Y,\$Z|Z 'store tetra unsigned'.

The four least significant bytes of register X are stored into bytes  $M_4[\$Y + \$Z]$  or  $M_4[\$Y + Z]$ . STTU instructions are the same as STT instructions, except that no test for overflow is made.

• STO \$X,\$Y,\$Z|Z 'store octa'.

Register X is stored into bytes  $M_8[\$Y + \$Z]$  or  $M_8[\$Y + Z]$ .

- STOU \$X,\$Y,\$Z|Z 'store octa unsigned'.
- Identical to STO \$X,\$Y,\$Z|Z.
- STCO X, \$Y, \$Z|Z 'store constant octabyte'.

An octabyte whose value is the unsigned byte X is stored into  $M_8[\$Y + \$Z]$  or  $M_8[\$Y + Z]$ .

• STHT \$X,\$Y,\$Z|Z 'store high tetra'.

The most significant four bytes of register X are stored into  $M_4[\$Y+\$Z]$  or  $M_4[\$Y+Z]$ .

- **9.** Adding and subtracting. Once numbers are in registers, we can compute with them. Let's consider addition and subtraction first.
- ADD \$X,\$Y,\$Z|Z 'add'.

The sum Y + Z or Y + Z is placed into register X using signed, two's complement arithmetic. An integer overflow exception occurs if the sum is  $\geq 2^{63}$  or  $< -2^{63}$ . (We will discuss overflow and other kinds of exceptions later.)

• ADDU \$X,\$Y,\$Z|Z 'add unsigned'.

The sum  $(\$Y + \$Z) \mod 2^{64}$  or  $(\$Y + Z) \mod 2^{64}$  is placed into register X. These instructions are the same as ADD \$X,\$Y,\$Z|Z commands except that no test for overflow is made. (Overflow could be detected if desired by using the command CMPU ovflo,\$X,\$Y after addition, where CMPU means "compare unsigned"; see below.)

 $\bullet$  2ADDU X,Y,Z|Z 'times 2 and add unsigned'.

The sum  $(2\$Y + \$Z) \mod 2^{64}$  or  $(2\$Y + Z) \mod 2^{64}$  is placed into register X.

• 4ADDU \$X,\$Y,\$Z|Z 'times 4 and add unsigned'.

The sum  $(4\$Y + \$Z) \mod 2^{64}$  or  $(4\$Y + Z) \mod 2^{64}$  is placed into register X.

• 8ADDU \$X,\$Y,\$Z|Z 'times 8 and add unsigned'.

The sum  $(8\$Y + \$Z) \mod 2^{64}$  or  $(8\$Y + Z) \mod 2^{64}$  is placed into register X.

• 16ADDU \$X,\$Y,\$Z|Z 'times 16 and add unsigned'.

The sum  $(16\$Y + \$Z) \mod 2^{64}$  or  $(16\$Y + Z) \mod 2^{64}$  is placed into register X.

#### • SUB \$X,\$Y,\$Z|Z 'subtract'.

The difference Y - Z or Y - Z is placed into register X using signed, two's complement arithmetic. An integer overflow exception occurs if the difference is  $\geq 2^{63}$  or  $< -2^{63}$ .

• SUBU \$X,\$Y,\$Z|Z 'subtract unsigned'.

The difference (\$Y - \$Z) mod  $2^{64}$  or (\$Y - Z) mod  $2^{64}$  is placed into register X. These two instructions are the same as SUB  $\$X,\$Y,\$Z \mid Z$  except that no test for overflow is made.

• NEG \$X,Y,\$Z|Z 'negate'.

The value Y-\$Z or Y-Z is placed into register X using signed, two's complement arithmetic. An integer overflow exception occurs if the result is greater than  $2^{63}-1$ . (Notice that in this case MMIX works with the "immediate" constant Y, not register Y. NEG commands are analogous to the immediate variants of other commands, because they save us from having to put one-byte constants into a register. When Y=0, overflow occurs if and only if  $\$Z=-2^{63}$ . The instruction NEG \$X,1,2 has exactly the same effect as NEG \$X,0,1.)

• NEGU \$X,Y,\$Z|Z 'negate unsigned'.

The value  $(Y - \$Z) \mod 2^{64}$  or  $(Y - Z) \mod 2^{64}$  is placed into register X. NEGU instructions are the same as NEG instructions, except that no test for overflow is made.

10. Bit fiddling. Before looking at multiplication and division, which take longer than addition and subtraction, let's look at some of the other things that MMIX can do fast. There are eighteen instructions for bitwise logical operations on unsigned numbers.

#### • AND \$X,\$Y,\$Z|Z 'bitwise and'.

Each bit of register Y is logically anded with the corresponding bit of register Z or of the constant Z, and the result is placed in register X. In other words, a bit of register X is set to 1 if and only if the corresponding bits of the operands are both 1; in symbols,  $X = Y \land Z$  or  $X = Y \land Z$ . This means in particular that AND X,Y,Z always zeroes out the seven most significant bytes of register X, because 0s are prefixed to the constant byte Z.

#### • OR \$X,\$Y,\$Z|Z 'bitwise or'.

Each bit of register Y is logically ored with the corresponding bit of register Z or of the constant Z, and the result is placed in register X. In other words, a bit of register X is set to 0 if and only if the corresponding bits of the operands are both 0; in symbols,  $X = Y \vee Z$  or  $X = Y \vee Z$ .

In the special case Z=0, the immediate variant of this command simply copies register Y to register X. The MMIX assembler allows us to write 'SET \$X,\$Y' as a convenient abbreviation for 'OR X,Y,0'.

• XOR \$X,\$Y,\$Z|Z 'bitwise exclusive-or'.

Each bit of register Y is logically xored with the corresponding bit of register Z or of the constant Z, and the result is placed in register X. In other words, a bit of register X is set to 0 if and only if the corresponding bits of the operands are equal; in symbols,  $X = Y \oplus Z$  or  $X = Y \oplus Z$ .

• ANDN \$X,\$Y,\$Z|Z 'bitwise and-not'.

Each bit of register Y is logically anded with the complement of the corresponding bit of register Z or of the constant Z, and the result is placed in register X. In other words, a bit of register X is set to 1 if and only if the corresponding bit of register Y is 1 and the other corresponding bit is 0; in symbols,  $X = Y \setminus Z$  or  $X = Y \setminus Z$ . (This is the *logical difference* operation; if the operands are bit strings representing sets, we are computing the elements that lie in one set but not the other.)

#### • ORN \$X,\$Y,\$Z|Z 'bitwise or-not'.

Each bit of register Y is logically ored with the complement of the corresponding bit of register Z or of the constant Z, and the result is placed in register X. In other words, a bit of register X is set to 1 if and only if the corresponding bit of register Y is greater than or equal to the other corresponding bit; in symbols,  $X = Y \vee \overline{Z}$  or  $X = Y \vee \overline{Z}$ . (This is the complement of  $Z \setminus Y$  or  $Z \setminus Y$ .)

#### • NAND \$X,\$Y,\$Z|Z 'bitwise not-and'.

Each bit of register Y is logically anded with the corresponding bit of register Z or of the constant Z, and the complement of the result is placed in register X. In other words, a bit of register X is set to 0 if and only if the corresponding bits of the operands are both 1; in symbols,  $X = Y \overline{\wedge} Z$ .

#### • NOR \$X,\$Y,\$Z|Z 'bitwise not-or'.

Each bit of register Y is logically ored with the corresponding bit of register Z or of the constant Z, and the complement of the result is placed in register X. In other words, a bit of register X is set to 1 if and only if the corresponding bits of the operands are both 0; in symbols,  $X = Y \nabla Z$  or  $X = Y \nabla Z$ .

#### • NXOR \$X,\$Y,\$Z|Z 'bitwise not-exclusive-or'.

Each bit of register Y is logically xored with the corresponding bit of register Z or of the constant Z, and the complement of the result is placed in register X. In other words, a bit of register X is set to 1 if and only if the corresponding bits of the operands are equal; in symbols,  $\$X = \$Y \oplus \$Z$  or  $\$X = \$Y \oplus \Xi$ .

#### • MUX \$X,\$Y,\$Z|Z 'bitwise multiplex'.

For each bit position j, the jth bit of register X is set either to bit j of register Y or to bit j of the other operand \$Z or Z, depending on whether bit j of the special mask register rM is 1 or 0: if  $M_j$  then  $Y_j$  else  $Z_j$ . In symbols,  $\$X = (\$Y \land rM) \lor (\$Z \land \overline{rM})$  or  $\$X = (\$Y \land rM) \lor (Z \land \overline{rM})$ . (MMIX has several such special registers, associated with instructions that need more than two inputs or produce more than one output.)

11. Besides the eighteen bitwise operations, MMIX can also perform unsigned bytewise and biggerwise operations that are somewhat more exotic.

• BDIF \$X,\$Y,\$Z|Z 'byte difference'.

For each byte position j, the jth byte of register X is set to byte j of register Y minus byte j of the other operand Z or Z, unless that difference is negative; in the latter case, byte j of X is set to zero.

• WDIF \$X,\$Y,\$Z|Z 'wyde difference'.

For each wyde position j, the jth wyde of register X is set to wyde j of register Y minus wyde j of the other operand Z or Z, unless that difference is negative; in the latter case, wyde j of X is set to zero.

• TDIF \$X,\$Y,\$Z|Z 'tetra difference'.

For each tetra position j, the jth tetra of register X is set to tetra j of register Y minus tetra j of the other operand Z or Z, unless that difference is negative; in the latter case, tetra j of X is set to zero.

• ODIF \$X,\$Y,\$Z|Z 'octa difference'.

Register X is set to register Y minus the other operand \$Z or Z, unless \$Z or Z exceeds register Y; in the latter case, \$X is set to zero. The operands are treated as unsigned integers.

The BDIF and WDIF commands are useful in applications to graphics or video; TDIF and ODIF are also present for reasons of consistency. For example, if a and b are registers containing 8-byte quantities, their bytewise maxima c and bytewise minima d are computed by

similarly, the individual "pixel differences" e, namely the absolute values of the differences of corresponding bytes, are computed by

BDIF 
$$x,a,b$$
; BDIF  $y,b,a$ ; OR  $e,x,y$ .

To add individual bytes of a and b while clipping all sums to 255 if they don't fit in a single byte, one can say

```
NOR acomp,a,0; BDIF x,acomp,b; NOR clippedsums,x,0;
```

in other words, complement a, apply BDIF, and complement the result. The operations can also be used to construct efficient operations on strings of bytes or wydes.

Exercise: Implement a "nybble difference" instruction that operates in a similar way on sixteen nybbles at a time.

Answer: AND x,a,m; AND y,b,m; ANDN xx,a,m; ANDN yy,b,m; BDIF x,x,y; BDIF xx,xx,yy; OR ans,x,xx where register m contains the mask #0f0f0f0f0f0f0f0f0f.

(The ANDN operation can be regarded as a "bit difference" instruction that operates in a similar way on 64 bits at a time.)

12. Three more pairs of bit-fiddling instructions round out the collection of exotics.

• SADD \$X,\$Y,\$Z|Z 'sideways add'.

Each bit of register Y is logically anded with the complement of the corresponding bit of register Z or of the constant Z, and the number of 1 bits in the result is placed in register X. In other words, register X is set to the number of bit positions in which register Y has a 1 and the other operand has a 0; in symbols,  $X = \nu(Y \setminus Z)$  or  $X = \nu(Y \setminus Z)$ . When the second operand is zero this operation is sometimes called "population counting," because it counts the number of 1s in register Y.

• MOR \$X,\$Y,\$Z|Z 'multiple or'.

Suppose the 64 bits of register Y are indexed as

$$y_{00}y_{01}\dots y_{07}y_{10}y_{11}\dots y_{17}\dots y_{70}y_{71}\dots y_{77};$$

in other words,  $y_{ij}$  is the jth bit of the ith byte, if we number the bits and bytes from 0 to 7 in big-endian fashion from left to right. Let the bits of the other operand, \$Z or Z, be indexed similarly:

$$z_{00}z_{01}\ldots z_{07}z_{10}z_{11}\ldots z_{17}\ldots z_{70}z_{71}\ldots z_{77}.$$

The MOR operation replaces each bit  $x_{ij}$  of register X by the bit

$$y_{0j}z_{i0}\vee y_{1j}z_{i1}\vee\cdots\vee y_{7j}z_{i7}.$$

Thus, for example, if register Z contains the constant #0102040810204080, MOR reverses the order of the bytes in register Y, converting between little-endian and big-endian addressing. (The *i*th byte of \$X depends on the bytes of \$Y as specified by the *i*th byte of \$Z or Z. If we regard 64-bit words as  $8 \times 8$  Boolean matrices, with one byte per column, this operation computes the Boolean product X = Y Z or X = Y Z. Alternatively, if we regard 64-bit words as X = X Z with one byte per *row*, MOR computes the Boolean product X = Z Y with operands in the opposite order. The immediate form MOR X Z, Z Z always sets the leading seven bytes of register X to zero; the other byte is set to the bitwise or of whatever bytes of register Y are specified by the immediate operand Z.)

Exercise: Explain how to compute a mask m that is \*ff in byte positions where a exceeds b, \*00 in all other bytes. Answer: BDIF x,a,b; MOR m,minusone,x; here minusone is a register consisting of all 1s. (Moreover, if we AND this result with #8040201008040201, then MOR with Z=255, we get a one-byte encoding of m.)

• MXOR \$X,\$Y,\$Z|Z 'multiple exclusive-or'.

This operation is like the Boolean multiplication just discussed, but exclusive-or is used to combine the bits. Thus we obtain a matrix product over the field of two elements instead of a Boolean matrix product. This operation can be used to construct hash functions, among many other things. (The hash functions aren't bad, but they are not "universal" in the sense of *Sorting and Searching*, exercise 6.4–72.)

13. Sixteen "immediate wyde" instructions are available for the common case that a 16-bit constant is needed. In this case the Y and Z fields of the instruction are regarded as a single 16-bit unsigned number YZ.

- SETH \$X,YZ 'set to high wyde'; SETMH \$X,YZ 'set to medium high wyde'; SETML \$X,YZ 'set to medium low wyde'; SETL \$X,YZ 'set to low wyde'.
- The 16-bit unsigned number YZ is shifted left by either 48 or 32 or 16 or 0 bits, respectively, and placed into register X. Thus, for example, SETML inserts a given value into the second-least-significant wyde of register X and sets the other three wydes to zero
- INCH \$X,YZ 'increase by high wyde'; INCMH \$X,YZ 'increase by medium high wyde'; INCML \$X,YZ 'increase by medium low wyde'; INCL \$X,YZ 'increase by low wyde'. The 16-bit unsigned number YZ is shifted left by either 48 or 32 or 16 or 0 bits, respectively, and added to register X, ignoring overflow; the result is placed back into register X.

If YZ is the hexadecimal constant #8000, the command INCH \$X,YZ complements the most significant bit of register X. We will see below that this can be used to negate a floating point number.

• ORH \$X,YZ 'bitwise or with high wyde'; ORMH \$X,YZ 'bitwise or with medium high wyde'; ORML \$X,YZ 'bitwise or with medium low wyde'; ORL \$X,YZ 'bitwise or with low wyde'.

The 16-bit unsigned number YZ is shifted left by either 48 or 32 or 16 or 0 bits, respectively, and ored with register X; the result is placed back into register X.

Notice that any desired 4-wyde constant GH IJ KL MN can be inserted into a register with a sequence of four instructions such as

```
SETH $X,GH; INCMH $X,IJ; INCML $X,KL; INCL $X,MN;
```

any of these INC instructions could also be replaced by OR.

• ANDNH \$X,YZ 'bitwise and-not high wyde'; ANDNMH \$X,YZ 'bitwise and-not medium high wyde'; ANDNML \$X,YZ 'bitwise and-not medium low wyde'; ANDNL \$X,YZ 'bitwise and-not low wyde'.

The 16-bit unsigned number YZ is shifted left by either 48 or 32 or 16 or 0 bits, respectively, then complemented and anded with register X; the result is placed back into register X.

If YZ is the hexadecimal constant #8000, the command ANDNH \$X,YZ forces the most significant bit of register X to be 0. This can be used to compute the absolute value of a floating point number.

- 14. MMIX knows several ways to shift a register left or right by any number of bits.
- SL \$X,\$Y,\$Z|Z 'shift left'.

The bits of register Y are shifted left by \$Z or Z places, and 0s are shifted in from the right; the result is placed in register X. Register Y is treated as a signed number, but the second operand is treated as an unsigned number. The effect is the same as multiplication by  $2^{Z}$  or by  $2^{Z}$ ; an integer overflow exception occurs if the result is  $\geq 2^{63}$  or  $< -2^{63}$ . In particular, if the second operand is 64 or more, register X will become entirely zero, and integer overflow will be signaled unless register Y was zero.

#### • SLU \$X,\$Y,\$Z|Z 'shift left unsigned'.

The bits of register Y are shifted left by \$Z or Z places, and 0s are shifted in from the right; the result is placed in register X. Both operands are treated as unsigned numbers. The SLU instructions are equivalent to SL, except that no test for overflow is made.

#### • SR \$X,\$Y,\$Z|Z 'shift right'.

The bits of register Y are shifted right by \$Z or Z places, and copies of the leftmost bit (the sign bit) are shifted in from the left; the result is placed in register X. Register Y is treated as a signed number, but the second operand is treated as an unsigned number. The effect is the same as division by  $2^{\$Z}$  or by  $2^{Z}$  and rounding down. In particular, if the second operand is 64 or more, register X will become zero if \$Y was nonnegative, -1 if \$Y was negative.

#### • SRU \$X,\$Y,\$Z|Z 'shift right unsigned'.

The bits of register Y are shifted right by Z or Z places, and 0s are shifted in from the left; the result is placed in register X. Both operands are treated as unsigned numbers. The effect is the same as unsigned division of a 64-bit number by  $Z^{Z}$ ; if the second operand is 64 or more, register X will become entirely zero.

MMIX: COMPARISONS 16

15. Comparisons. Arithmetic and logical operations are nice, but computer programs also need to compare numbers and to change the course of a calculation depending on what they find. MMIX has four comparison instructions to facilitate such decision-making.

#### • CMP \$X,\$Y,\$Z|Z 'compare'.

Register X is set to -1 if register Y is less than register Z or less than the unsigned immediate value Z, using the conventions of signed arithmetic; it is set to 0 if register Y is equal to register Z or equal to the unsigned immediate value Z; otherwise it is set to 1. In symbols, X = [Y > Z] - [Y < Z] or X = [Y > Z] - [Y < Z].

• CMPU \$X,\$Y,\$Z|Z 'compare unsigned'.

Register X is set to -1 if register Y is less than register Z or less than the unsigned immediate value Z, using the conventions of unsigned arithmetic; it is set to 0 if register Y is equal to register Z or equal to the unsigned immediate value Z; otherwise it is set to 1. In symbols, X = [\$Y > \$Z] - [\$Y < \$Z] or X = [\$Y > Z] - [\$Y < Z].

17 MMIX: COMPARISONS

16. There also are 32 conditional instructions, which choose quickly between two alternative courses of action.

• CSN \$X,\$Y,\$Z|Z 'conditionally set if negative'.

If register Y is negative (namely if its most significant bit is 1), register X is set to the contents of register Z or to the unsigned immediate value Z. Otherwise nothing happens.

- CSZ \$X,\$Y,\$Z|Z 'conditionally set if zero'.
- CSP \$X,\$Y,\$Z|Z 'conditionally set if positive'.
- CSOD \$X,\$Y,\$Z|Z 'conditionally set if odd'.
- CSNN \$X,\$Y,\$Z|Z 'conditionally set if nonnegative'.
- CSNZ \$X,\$Y,\$Z|Z 'conditionally set if nonzero'.
- CSNP \$X,\$Y,\$Z|Z 'conditionally set if nonpositive'.
- CSEV \$X,\$Y,\$Z|Z 'conditionally set if even'.

These instructions are entirely analogous to CSN, except that register X changes only if register Y is respectively zero, positive, odd, nonnegative, nonzero, nonpositive, or nonodd.

• ZSN \$X,\$Y,\$Z|Z 'zero or set if negative'.

If register Y is negative (namely if its most significant bit is 1), register X is set to the contents of register Z or to the unsigned immediate value Z. Otherwise register X is set to zero.

- ZSZ \$X,\$Y,\$Z|Z 'zero or set if zero'.
- ZSP \$X,\$Y,\$Z|Z 'zero or set if positive'.
- ZSOD \$X,\$Y,\$Z|Z 'zero or set if odd'.
- ZSNN \$X,\$Y,\$Z|Z 'zero or set if nonnegative'.
- ZSNZ \$X,\$Y,\$Z|Z 'zero or set if nonzero'.
- ZSNP \$X,\$Y,\$Z|Z 'zero or set if nonpositive'.
- ZSEV \$X,\$Y,\$Z|Z 'zero or set if even'.

These instructions are entirely analogous to ZSN, except that \$X is set to \$Z or Z if register Y is respectively zero, positive, odd, nonnegative, nonzero, nonpositive, or even; otherwise \$X is set to zero.

Notice that the two instructions CMPU r,s,0 and ZSNZ r,s,1 have the same effect. So do the two instructions CSNP r,s,0 and ZSP r,s,r. So do AND r,s,1 and ZSOD r,s,1.

17. Branches and jumps. MMIX ordinarily executes instructions in sequence, proceeding from an instruction in tetrabyte  $M_4[\lambda]$  to the instruction in  $M_4[\lambda+4]$ . But there are several ways to interrupt the normal flow of control, most of which use the Y and Z fields of an instruction as a combined 16-bit YZ field. For example, BNZ \$3,@+4000 (branch if nonzero) is typical: It means that control should skip ahead 1000 instructions to the command that appears 4000 bytes after the BNZ, if register 3 is not equal to zero.

There are eight branch-forward instructions, corresponding to the eight conditions in the CS and ZS commands that we discussed earlier. And there are eight similar branch-backward instructions; for example, BOD \$2,@-4000 (branch if odd) takes control to the instruction that appears 4000 bytes before this BOD command, if register 2 is odd. The numeric OP-code when branching backward is one greater than the OP-code when branching forward; the assembler takes care of this automatically, just as it takes cares of changing ADD from 32 to 33 when necessary.

Since branches are relative to the current location, the MMIX assembler treats branch instructions in a special way. Suppose a programmer writes 'BNZ \$3,Case5', where Case5 is the address of an instruction in location l. If this instruction appears in location  $\lambda$ , the assembler first computes the displacement  $\delta = \lfloor (l-\lambda)/4 \rfloor$ . Then if  $\delta$  is nonnegative, the quantity  $\delta$  is placed in the YZ field of a BNZ command, and it should be less than  $2^{16}$ ; if  $\delta$  is negative, the quantity  $2^{16} + \delta$  is placed in the YZ field of a BNZ command with OP-code increased by 1, and  $\delta$  should not be less than  $-2^{16}$ .

The symbol @ used in our examples of BNZ and BOD above is interpreted by the assembler as an abbreviation for "the location of the current instruction." In the following notes we will define pairs of branch commands by writing, for example, 'BNZ X,@+4\*YZ[-262144]'; this stands for a branch-forward command that branches to the current location plus four times YZ, as well as for a branch-backward command that branches to the current location plus four times (YZ -65536).

- BN \$X,0+4\*YZ[-262144] 'branch if negative'.
- BZ \$X, @+4\*YZ[-262144] 'branch if zero'.
- BP \$X, @+4\*YZ[-262144] 'branch if positive'.
- BOD \$X, @+4\*YZ[-262144] 'branch if odd'.
- BNN \$X, 0+4\*YZ[-262144] 'branch if nonnegative'.
- BNZ \$X,@+4\*YZ[-262144] 'branch if nonzero'.
- BNP \$X, @+4\*YZ [-262144] 'branch if nonpositive'.
- BEV \$X, @+4\*YZ[-262144] 'branch if even'.

If register X is respectively negative, zero, positive, odd, nonnegative, nonzero, non-positive, or even, and if this instruction appears in memory location  $\lambda$ , the next instruction is taken from memory location  $\lambda+4$ YZ (branching forward) or  $\lambda+4$ (YZ-2<sup>16</sup>) (branching backward). Thus one can go from location  $\lambda$  to any location between  $\lambda-262,144$  and  $\lambda+262,140$ , inclusive.

Sixteen additional branch instructions called *probable branches* are also provided. They have exactly the same meaning as ordinary branch instructions; for example, PBOD \$2,@-4000 and BOD \$2,@-4000 both go backward 4000 bytes if register 2 is odd. But they differ in running time: On some implementations of MMIX, a branch

instruction takes longer when the branch is taken, while a probable branch takes longer when the branch is *not* taken. Thus programmers should use a B instruction when they think branching is relatively unlikely, but they should use PB when they expect branching to occur more often than not. Here is a list of the probable branch commands, for completeness:

- PBN \$X, @+4\*YZ[-262144] 'probable branch if negative'.
- PBZ \$X,@+4\*YZ[-262144] 'probable branch if zero'.
- PBP \$X, @+4\*YZ[-262144] 'probable branch if positive'.
- PBOD \$X, @+4\*YZ[-262144] 'probable branch if odd'.
- PBNN \$X, @+4\*YZ[-262144] 'probable branch if nonnegative'.
- PBNZ \$X, @+4\*YZ[-262144] 'probable branch if nonzero'.
- PBNP \$X,@+4\*YZ[-262144] 'probable branch if nonpositive'.
- PBEV \$X,@+4\*YZ[-262144] 'probable branch if even'.
- 18. Locations that are relative to the current instruction can be transformed into absolute locations with GETA commands.
- GETA \$X, @+4\*YZ[-262144] 'get address'.

The value  $\lambda + 4YZ$  or  $\lambda + 4(YZ - 2^{16})$  is placed in register X. (The assembly language conventions of branch instructions apply; for example, we can write 'GETA \$X,Addr'.)

- 19. MMIX also has unconditional jump instructions, which change the location of the next instruction no matter what.
- JMP @+4\*XYZ[-67108864] 'jump'.

A JMP command treats bytes X, Y, and Z as an unsigned 24-bit integer XYZ. It allows a program to transfer control from location  $\lambda$  to any location between  $\lambda - 67{,}108{,}864$  and  $\lambda + 67{,}108{,}860$  inclusive, using relative addressing as in the B and PB commands.

• GO \$X,\$Y,\$Z|Z 'go to location'.

MMIX takes its next instruction from location Y + Z or Y + Z, and continues from there. Register X is set equal to  $\lambda + 4$ , the location of the instruction that would ordinarily have been executed next. (GO is similar to a jump, but it is not relative to the current location. Since GO has the same format as a load or store instruction, a loading routine can treat program labels with the same mechanism that is used to treat references to data.)

An old-fashioned type of subroutine linkage can be implemented by saying either 'GO r, subloc, 0' or 'GETA r, @+8; JMP Sub' to enter a subroutine, then 'GO r, r, 0' to return. But subroutines are normally entered with the instructions PUSHJ or PUSHGO.

The two least significant bits of the address in a GO command are essentially ignored. They will, however, appear in the value of  $\lambda$  returned by GETA instructions, and in the return-jump register rJ after PUSHJ or PUSHGO instructions are performed, and in the where-interrupted register at the time of an interrupt. Therefore they could be used to send some kind of signal to a subroutine or (less likely) to an interrupt handler.

- **20.** Multiplication and division. Now for some instructions that make MMIX work harder.
- MUL \$X,\$Y,\$Z|Z 'multiply'.

The signed product of the number in register Y by either the number in register Z or the unsigned byte Z replaces the contents of register X. An integer overflow exception can occur, as with ADD or SUB, if the result is less than  $-2^{63}$  or greater than  $2^{63} - 1$ . (Immediate multiplication by powers of 2 can be done more rapidly with the SL instruction.)

• MULU \$X,\$Y,\$Z|Z 'multiply unsigned'.

The lower 64 bits of the unsigned 128-bit product of register Y and either register Z or Z are placed in register X, and the upper 64 bits are placed in the special *himult register* rH. (Immediate multiplication by powers of 2 can be done more rapidly with the SLU instruction, if the upper half is not needed. Furthermore, an instruction like 4ADDU \$X,\$Y,\$Y is faster than MULU \$X,\$Y,5.)

#### • DIV \$X,\$Y,\$Z|Z 'divide'.

The signed quotient of the number in register Y divided by either the number in register Z or the unsigned byte Z replaces the contents of register X, and the signed remainder is placed in the special remainder register rR. An integer divide check exception occurs if the divisor is zero; in that case \$X\$ is set to zero and rR is set to \$Y\$. An integer overflow exception occurs if the number  $-2^{63}$  is divided by -1; otherwise integer overflow is impossible. The quotient of y divided by z is defined to be  $\lfloor y/z \rfloor$ , and the remainder is defined to be  $y - \lfloor y/z \rfloor z$  (also written  $y \mod z$ ). Thus, the remainder is either zero or has the sign of the divisor. Dividing by  $z = 2^t$  gives exactly the same quotient as shifting right t via the SR command, and exactly the same remainder as anding with z - 1 via the AND command. Division of a positive 63-bit number by a positive constant can be accomplished more quickly by computing the upper half of a suitable unsigned product and shifting it right appropriately.

#### • DIVU \$X,\$Y,\$Z|Z 'divide unsigned'.

The unsigned 128-bit number obtained by prefixing the special dividend register rD to the contents of register Y is divided either by the unsigned number in register Z or by the unsigned byte Z, and the quotient is placed in register X. The remainder is placed in the remainder register rR. However, if rD is greater than or equal to the divisor (and in particular if the divisor is zero), then X is set to rD and rR is set to Y. (Unsigned arithmetic never signals an exceptional condition, even when dividing by zero.) If rD is zero, unsigned division by  $z=2^t$  gives exactly the same quotient as shifting right t via the SRU command, and exactly the same remainder as anding with z-1 via the AND command. Section 4.3.1 of Seminumerical Algorithms explains how to use unsigned division to obtain the quotient and remainder of extremely large numbers.

21. Floating point computations. Floating point arithmetic conforming to the famous IEEE/ANSI Standard 754 is provided for arbitrary 64-bit numbers. The IEEE standard refers to such numbers as "double format" quantities, but MMIX calls them simply floating point numbers because 64-bit quantities are the norm.

A positive floating point number has 53 bits of precision and can range from approximately  $10^{-308}$  to  $10^{308}$ . "Subnormal numbers" between  $10^{-324}$  and  $10^{-308}$  can also be represented, but with fewer bits of precision. Floating point numbers can be infinite, and they satisfy such identities as  $1.0/\infty = +0.0, -2.8 \times \infty = -\infty$ . Floating point quantities can also be "Not-a-Numbers" or NaNs, which are further classified into signaling NaNs and quiet NaNs.

Five kinds of exceptions can occur during floating point computations, and they each have code letters: Floating overflow (O) or underflow (U); floating divide by zero (Z); floating inexact (X); and floating invalid (I). For example, the multiplication of sufficiently small integers causes no exceptions, and the division of 91.0 by 13.0 is also exception-free, but the division 1.0/3.0 is inexact. The multiplication of extremely large or extremely small floating point numbers is inexact and it also causes overflow or underflow. Invalid results occur when taking the square root of a negative number; mathematicians can remember the I exception by relating it to the square root of -1.0. Invalid results also occur when trying to convert infinity or a quiet NaN to a fixed-point integer, or when any signaling NaN is encountered, or when mathematically undefined operations like  $\infty - \infty$  or 0/0 are requested. (Programmers can be sure that they have not erroneously used uninitialized floating point data if they initialize all their variables to signaling NaN values.)

Four different rounding modes for inexact results are available: round to nearest (and to even in case of ties); round off (toward zero); round up (toward  $+\infty$ ); or round down (toward  $-\infty$ ). MMIX has a special arithmetic status register rA that specifies the current rounding mode and the user's current preferences for exception handling.

IEEE standard arithmetic provides an excellent foundation for scientific calculations, and it will be thoroughly explained in the fourth edition of Seminumerical Algorithms, Section 4.2. For our present purposes, we need not study all the details; but we do need to specify MMIX's behavior with respect to several things that are not completely defined by the standard. For example, the IEEE standard does not fully define the result of operations with NaNs.

```
\begin{array}{ll} \pm 0.0, & \text{if } e=f=0 \text{ (zero);} \\ \pm 2^{-1022}f, & \text{if } e=0 \text{ and } f>0 \text{ (subnormal);} \\ \pm 2^{e-1023}(1+f), & \text{if } 0< e<2047 \text{ (normal);} \\ \pm \infty, & \text{if } e=2047 \text{ and } f=0 \text{ (infinite);} \\ \pm \text{NaN}(f), & \text{if } e=2047 \text{ and } 0< f<1/2 \text{ (signaling NaN);} \\ \pm \text{NaN}(f), & \text{if } e=2047 \text{ and } f\geq 1/2 \text{ (quiet NaN).} \end{array}
```

Notice that +0.0 is distinguished from -0.0; this fact is important for interval arithmetic.

Exercise: What 64 bits represent the floating point number 1.0? Answer: We want e = 1023 and f = 0, so the answer is #3ff00000000000.

The seven IEEE floating point arithmetic operations (addition, subtraction, 22. multiplication, division, remainder, square root, and nearest-integer) all share common features, called the standard floating point conventions in the discussion below: The operation is performed on floating point numbers found in two registers, \$Y and \$Z, except that square root and integerization involve only one operand. If neither input operand is a NaN, we first determine the exact result, then round it using the current rounding mode found in special register rA. Infinite results are exact and need no rounding. A floating overflow exception occurs if the rounded result is finite but needs an exponent greater than 2046. A floating underflow exception occurs if the rounded result needs an exponent less than 1 and either (i) the unrounded result cannot be represented exactly as a subnormal number or (ii) the "floating underflow trip" is enabled in rA. (Trips are discussed below.) NaNs are treated specially as follows: If either \$Y or \$Z is a signaling NaN, an invalid exception occurs and the NaN is quieted by adding 1/2 to its fraction part. Then if \$Z is a quiet NaN, the result is set to \$Z; otherwise if \$Y is a quiet NaN, the result is set to \$Y. (Registers \$Y and \$Z do not actually change.)

#### • FADD \$X,\$Y,\$Z 'floating add'.

The floating point sum Y+Z is computed by the standard floating point conventions just described, and placed in register X. An invalid exception occurs if the sum is  $(+\infty) + (-\infty)$  or  $(-\infty) + (+\infty)$ ; in that case the result is NaN(1/2) with the sign of Z. If the sum is exactly zero and the current mode is not rounding-down, the result is +0.0 except that (-0.0) + (-0.0) = -0.0. If the sum is exactly zero and the current mode is rounding-down, the result is +0.0 except that +0.00 except that +0.01 except that +0.01 except that +0.02 except that +0.03 except that +0.03 except that +0.04 except that +0.05 except that +0.06 except that +0.07 except that +0.07 except that +0.08 except that +0.09 except th

Floating point underflow cannot occur unless the U-trip has been enabled, because any underflowing result of floating point addition can be represented exactly as a subnormal number.

Silly but instructive exercise: Find all pairs of numbers (\$Y,\$Z) such that the commands FADD \$X,\$Y,\$Z and ADDU \$X,\$Y,\$Z both produce the same result in \$X (although FADD may cause floating exceptions). Answer: Of course \$Y or \$Z could be zero, if the other one is not a signaling NaN. Or one could be signaling and the other #000800000000000. Other possibilities occur when they are both positive and less than #001000000000001; or when one operand is #00000000000001 and the other is an odd number between #00200000000001 and #002fffffffffff inclusive (rounding to nearest). And still more surprising possibilities exist, such as #7f6001b4c67bc809+#ff5ffb6a4534a3f7. All eight families of solutions will be revealed some day in the fourth edition of Seminumerical Algorithms.

#### • FSUB \$X,\$Y,\$Z 'floating subtract'.

This instruction is equivalent to FADD, but with the sign of \$Z negated unless \$Z is a NaN.

#### • FMUL \$X,\$Y,\$Z 'floating multiply'.

The floating point product  $Y \times Z$  is computed by the standard floating point

conventions, and placed in register X. An invalid exception occurs if the product is  $(\pm 0.0) \times (\pm \infty)$  or  $(\pm \infty) \times (\pm 0.0)$ ; in that case the result is  $\pm \text{NaN}(1/2)$ . No exception occurs for the product  $(\pm \infty) \times (\pm \infty)$ . If neither \$Y nor \$Z is a NaN, the sign of the result is the product of the signs of \$Y and \$Z.

#### • FDIV \$X,\$Y,\$Z 'floating divide'.

The floating point quotient Y/Z is computed by the standard floating point conventions, and placed in X. A floating divide by zero exception occurs if the quotient is (normal or subnormal)/( $\pm 0.0$ ). An invalid exception occurs if the quotient is  $(\pm 0.0)/(\pm 0.0)$  or  $(\pm \infty)/(\pm \infty)$ ; in that case the result is  $\pm NaN(1/2)$ . No exception occurs for the quotient  $(\pm \infty)/(\pm 0.0)$ . If neither Y nor Z is a NaN, the sign of the result is the product of the signs of Y and Z.

If a floating point number in register X is known to have an exponent between 2 and 2046, the instruction INCH \$X,#fff0 will divide it by 2.0.

#### • FREM \$X,\$Y,\$Z 'floating remainder'.

The floating point remainder  $Y = x \le 1$  is computed by the standard floating point conventions, and placed in register X. (The IEEE standard defines the remainder to be  $Y = n \times Z$ , where  $x \ge 1$  is the nearest integer to  $Y \le 1$ , and  $x \ge 1$  is an even integer in case of ties. This is not the same as the remainder  $Y \ge 1$  mod  $Z \ge 1$  computed by DIV or DIVU.) A zero remainder has the sign of  $Y \ge 1$ . An invalid exception occurs if  $Y \ge 1$  is infinite and/or  $Z \ge 1$  is zero; in that case the result is  $X \ge 1$  with the sign of  $Y \ge 1$ .

#### • FSQRT \$X,\$Z 'floating square root'.

The floating point square root  $\sqrt{\$Z}$  is computed by the standard floating point conventions, and placed in register X. An invalid exception occurs if \$Z is a negative number (either infinite, normal, or subnormal); in that case the result is -NaN(1/2). No exception occurs when taking the square root of -0.0 or  $+\infty$ . In all cases the sign of the result is the sign of \$Z.

The Y field of FSQRT can be used to specify a special rounding mode, as explained below.

#### • FINT \$X,\$Z 'floating integer'.

The floating point number in register Z is rounded (if necessary) to a floating point integer, using the current rounding mode, and placed in register X. Infinite values and quiet NaNs are not changed; signaling NaNs are treated as in the standard conventions. Floating point overflow and underflow exceptions cannot occur.

The Y field of FINT can be used to specify a special rounding mode, as explained below.

23. Besides doing arithmetic, we need to compare floating point numbers with each other, taking proper account of NaNs and the fact that -0.0 should be considered equal to +0.0. The following instructions are analogous to the comparison operators CMP and CMPU that we have used for integers.

#### • FCMP \$X,\$Y,\$Z 'floating compare'.

Register X is set to -1 if Y < Z according to the conventions of floating point arithmetic, or to 1 if Y > Z according to those conventions. Otherwise it is set to 0. An invalid exception occurs if either Y or Z is a NaN; in such cases the result is zero.

#### • FEQL \$X,\$Y,\$Z 'floating equal to'.

Register X is set to 1 if \$Y = \$Z according to the conventions of floating point arithmetic. Otherwise it is set to 0. The result is zero if either \$Y or \$Z is a NaN, even if a NaN is being compared with itself. However, no invalid exception occurs, not even when \$Y or \$Z is a signaling NaN. (Perhaps MMIX differs slightly from the IEEE standard in this regard, but programmers sometimes need to look at signaling NaNs without encountering side effects. Programmers who insist on raising an invalid exception whenever a signaling NaN is compared for floating equality should issue the instructions FSUB \$X,\$Y,\$Y; FSUB \$X,\$Z,\$Z just before saying FEQL \$X,\$Y,\$Z.)

Suppose w, x, y, and z are unsigned 64-bit integers with  $w < x < 2^{63} \le y < z$ . Thus, the leftmost bits of w and x are 0, while the leftmost bits of y and z are 1. Then we have w < x < y < z when these numbers are considered as unsigned integers, but y < z < w < x when they are considered as signed integers, because y and z are negative. Furthermore, we have  $z < y \le w < x$  when these same 64-bit quantities are considered to be floating point numbers, assuming that no NaNs are present, because the leftmost bit of a floating point number represents its sign and the remaining bits represent its magnitude. The case y = w occurs in floating point comparison if and only if y is the representation of -0.0 and w is the representation of +0.0.

#### • FUN \$X,\$Y,\$Z 'floating unordered'.

Register X is set to 1 if \$Y and \$Z are unordered according to the conventions of floating point arithmetic (namely, if either one is a NaN); otherwise register X is set to 0. No invalid exception occurs, not even when \$Y or \$Z is a signaling NaN.

The IEEE standard discusses 26 different possible relations on floating point numbers; MMIX implements 14 of them with single instructions, followed by a branch (or by a ZS to make a "pure" 0 or 1 result); all 26 can be evaluated with a sequence of at most four MMIX commands and a subsequent branch. The hardest case to handle is '?>=' (unordered or greater or equal, to be computed without exceptions), for which the following sequence makes  $X \ge 0$  if and only if  $X \ge 5$ :

```
FUN $255,$Y,$Z
BP $255,1F % skip ahead if unordered
FCMP $X,$Y,$Z % $X=[$Y>$Z]-[$Y<$Z]; no exceptions will arise
1H CSNZ $X,$255,1 % $X=1 if unordered</pre>
```

24. Exercise: Suppose MMIX had no FINT instruction. Explain how to obtain the equivalent of FINT \$X,\$Z using other instructions. Your program should do the proper thing with respect to NaNs and exceptions. (For example, it should cause an invalid exception if and only if \$Z is a signaling NaN; it should cause an inexact exception only if \$Z needs to be rounded to another value.)

Answer: (The assembler prefixes hexadecimal constants by #.)

```
$0,#4330 % $0=2^52
   SETH
                    % $1=$Z
    SET
           $1,$Z
    ANDNH
           $1,#8000 % $1=abs($Z)
    ANDN
           $2,$Z,$1 % $2=signbit($Z)
   FUN
           $3,$Z,$Z % $3=[$Z is a NaN]
                    % skip ahead if $Z is a NaN
   BNZ
           3,$1,$0 \% 3=[abs($Z)>2^52]-[abs($Z)<2^52]
   FCMP
           $0,$3,0 % set $0=0 if $3>=0
   CSNN
   OR
           $0,$2,$0 % attach sign of $Z to $0
           $1,$Z,$0 % $1=$Z+$0
1H FADD
   FSUB
           $1,$1,$0 % $X=$1-$0
           $X,$1,$2 % make sure minus zero isn't lost
   OR
```

This program handles most cases of interest by adding and subtracting  $\pm 2^{52}$  using floating point arithmetic. It would be incorrect to do this in all cases; for example, such addition/subtraction might fail to give the correct answer when \$Z\$ is a small negative quantity (if rounding toward zero), or when \$Z\$ is a number like  $2^{105} + 2^{53}$  (if rounding to nearest).

**25.** MMIX goes beyond the IEEE standard to define additional relations between floating point numbers, as suggested by the theory in Section 4.2.2 of Seminumerical Algorithms. Given a nonnegative number  $\epsilon$ , each normal floating point number u = (f, e) has a neighborhood

$$N_{\epsilon}(u) = \{x \mid |x - u| \le 2^{e - 1022} \epsilon \};$$

we also define  $N_{\epsilon}(0) = \{0\}$ ,  $N_{\epsilon}(u) = \{x \mid |x - u| \leq 2^{-1021}\epsilon\}$  if u is subnormal;  $N_{\epsilon}(\pm \infty) = \{\pm \infty\}$  if  $\epsilon < 1$ ,  $N_{\epsilon}(\pm \infty) = \{\text{everything except } \mp \infty\}$  if  $1 \leq \epsilon < 2$ ,  $N_{\epsilon}(\pm \infty) = \{\text{everything}\}$  if  $\epsilon \geq 2$ . Then we write

$$u \prec v$$
 ( $\epsilon$ ), if  $u < N_{\epsilon}(v)$  and  $N_{\epsilon}(u) < v$ ;  
 $u \sim v$  ( $\epsilon$ ), if  $u \in N_{\epsilon}(v)$  or  $v \in N_{\epsilon}(u)$ ;  
 $u \approx v$  ( $\epsilon$ ), if  $u \in N_{\epsilon}(v)$  and  $v \in N_{\epsilon}(u)$ ;  
 $u \succ v$  ( $\epsilon$ ), if  $u > N_{\epsilon}(v)$  and  $N_{\epsilon}(u) > v$ .

- FCMPE \$X,\$Y,\$Z 'floating compare (with respect to epsilon)'.
- Register X is set to -1 if  $\$Y \prec \$Z$  (rE) according to the conventions of Seminumerical Algorithms as stated above; it is set to 1 if  $\$Y \succ \$Z$  (rE) according to those conventions; otherwise it is set to 0. Here rE is a floating point number in the special epsilon register, which is used only by the floating point comparison operations FCMPE, FEQLE, and FUNE. An invalid exception occurs, and the result is zero, if any of \$Y, \$Z, or rE are NaN, or if rE is negative. If no such exception occurs, exactly one of the three conditions  $\$Y \prec \$Z$ ,  $\$Y \sim \$Z$ ,  $\$Y \succ \$Z$  holds with respect to rE.
- FEQLE \$X,\$Y,\$Z 'floating equivalent (with respect to epsilon)'. Register X is set to 1 if \$Y  $\approx$  \$Z (rE) according to the conventions of Seminumerical Algorithms as stated above; otherwise it is set to 0. An invalid exception occurs, and the result is zero, if any of \$Y, \$Z, or rE are NaN, or if rE is negative. Notice that the relation \$Y  $\approx$  \$Z computed by FEQLE is stronger than the relation \$Y  $\sim$  \$Z computed by FCMPE.
- FUNE \$X,\$Y,\$Z 'floating unordered (with respect to epsilon)'. Register X is set to 1 if \$Y,\$Z, or rE are exceptional as discussed for FCMPE and FEQLE; otherwise it is set to 0. No exceptions occur, even if \$Y,\$Z, or rE is a signaling NaN. Exercise: What floating point numbers does FCMPE regard as  $\sim 0.0$  with respect to  $\epsilon = 1/2$ , when no exceptions arise? Answer: Zero, subnormal numbers, and normal numbers with f = 0. (The numbers similar to zero with respect to  $\epsilon$  are zero, subnormal numbers with  $f \leq 2\epsilon$ , normal numbers with  $f \leq 2\epsilon 1$ , and  $\pm \infty$  if  $\epsilon >= 1$ .)
- 26. The IEEE standard also defines 32-bit floating point quantities, which it calls "single format" numbers. MMIX calls them *short floats*, and converts between 32-bit and 64-bit forms when such numbers are loaded from memory or stored into memory. A short float consists of a sign bit followed by an 8-bit exponent and a 23-bit fraction. After it has been loaded into one of MMIX's registers, its 52-bit fraction part will have 29 trailing zero bits, and its exponent e will be one of the 256 values 0,  $(01110000001)_2 = 897$ ,  $(01110000001)_2 = 898$ , ...,  $(100011111110)_2 = 1150$ , or 2047, unless it was subnormal; a subnormal short float loads into a normal number with  $874 \le e \le 896$ .

#### • LDSF \$X,\$Y,\$Z|Z 'load short float'.

Register X is set to the 64-bit floating point number corresponding to the 32-bit floating point number represented by  $M_4[\$Y + \$Z]$  or  $M_4[\$Y + Z]$ . No arithmetic exceptions occur, not even if a signaling NaN is loaded.

• STSF \$X,\$Y,\$Z|Z 'store short float'.

The value obtained by rounding register X to a 32-bit floating point number is placed in  $M_4[\$Y + \$Z]$  or  $M_4[\$Y + Z]$ . Rounding is done with the current rounding mode, in a manner exactly analogous to the standard conventions for rounding 64-bit results, except that the precision and exponent range are limited. In particular, floating overflow, underflow, and inexact exceptions might occur; a signaling NaN will trigger an invalid exception and it will become quiet. The fraction part of a NaN is truncated if necessary to a multiple of  $2^{-23}$ , by ignoring the least significant 29 bits.

If we load any two short floats and operate on them once with either FADD, FSUB, FMUL, FDIV, FREM, FSQRT, or FINT, and if we then store the result as a short float, we obtain the results required by the IEEE standard for single format arithmetic, because the double format can be shown to have enough precision to avoid any problems of "double rounding." But programmers are usually better off sticking to 64-bit arithmetic unless they have a strong reason to emulate the precise behavior of a 32-bit computer; 32 bits do not offer much precision.

- **27.** Of course we need to be able to go back and forth between integers and floating point values.
- FIX \$X,\$Z 'convert floating to fixed'.

The floating point number in register Z is converted to an integer as with the FINT instruction, and the resulting integer (mod  $2^{64}$ ) is placed in register X. An invalid exception occurs if \$Z\$ is infinite or a NaN; in that case \$X\$ is simply set equal to \$Z\$. A float-to-fix exception occurs if the result is less than  $-2^{63}$  or greater than  $2^{63} - 1$ .

- FIXU \$X,\$Z 'convert floating to fixed unsigned'.
- This instruction is identical to FIX except that no float-to-fix exception occurs.
- FLOT \$X,\$Z|Z 'convert fixed to floating'.

The integer in \$Z or the immediate constant Z is converted to the nearest floating point value (using the current rounding mode) and placed in register X. A floating inexact exception occurs if rounding is necessary.

- FLOTU \$X,\$Z|Z 'convert fixed to floating unsigned'. FLOTU is like FLOT, but \$Z is treated as an unsigned integer.
- $\bullet$  SFLOT \$X,\$Z|Z 'convert fixed to short float'; SFLOTU \$X,\$Z|Z 'convert fixed to short float unsigned'.

The SFLOT instructions are like the FLOT instructions, except that they round to a floating point number whose fraction part is a multiple of  $2^{-23}$ . (Thus, the resulting value will not be changed by a "store short float" instruction.) Such conversions appear in MMIX's repertoire only to establish complete conformance with the IEEE standard; a programmer needs them only when emulating a 32-bit machine.

28. Since the variants of FIX and FLOT involve only one input operand (\$Z or Z), their Y field is normally zero. A programmer can, however, force the mode of rounding used with these commands by setting

```
\begin{array}{lll} Y=1, & \texttt{ROUND\_OFF} & (none); \\ Y=2, & \texttt{ROUND\_UP} & (away from zero); \\ Y=3, & \texttt{ROUND\_DOWN} & (toward zero); \\ Y=4, & \texttt{ROUND\_NEAR} & (to closest); \end{array}
```

for example, the instruction FLOTU \$X,ROUND\_OFF,\$Z will set the exponent e of register X to 1086-l if \$Z is a nonzero quantity with l leading zero bits. Thus we can count leading zeros by continuing with SETL \$0,1086; SR \$X,\$X,52; SUB \$X,\$0,\$X; CSZ \$X,\$Z,64.

The Y field can also be used in the same way to specify any desired rounding mode in the other floating point instructions that have only a single operand, namely FSQRT and FINT. An illegal instruction interrupt occurs if Y exceeds 4 in any of these commands.

**29.** Subroutine linkage. MMIX has several special operations designed to facilitate the process of calling and implementing subroutines. The key notion is the idea of a hardware-supported register stack, which can coexist with a software-supported stack of variables that are not maintained in registers. From a programmer's standpoint, MMIX maintains a potentially unbounded list  $S[0], S[1], \ldots, S[\tau-1]$  of octabytes holding the contents of registers that are temporarily inaccessible; initially  $\tau=0$ . When a subroutine is entered, registers can be "pushed" on to the end of this list, increasing  $\tau$ ; when the subroutine has finished its execution, the registers are "popped" off again and  $\tau$  decreases.

Our discussion so far has treated all 256 registers \$0, \$1, ..., \$255 as if they were alike. But in fact, MMIX maintains two internal one-byte counters L and G, where  $0 \le L \le G < 256$ , with the property that

```
registers 0, 1, \ldots, L-1 are "local";
registers L, L+1, \ldots, G-1 are "marginal";
registers G, G+1, \ldots, 255 are "global."
```

A marginal register is zero when its value is read.

The G counter is normally set to a fixed value once and for all when a program is loaded, thereby defining the number of program variables that will live entirely in registers rather than in memory during the course of execution. A programmer may, however, change G dynamically using the PUT instruction described below.

The L counter starts at 0. If an instruction places a value into a register that is currently marginal, namely a register x such that  $L \leq x < G$ , the value of L will increase to x+1, and any newly local registers will be zero. For example, if L=10 and G=200, the instruction ADD \$5,\$15,1 would simply set \$5 to 1. But the instruction ADD \$15,\$5,\$200 would set \$10,\$11,...,\$14 to zero, \$15 to \$5+\$200, and L to 16. (The process of clearing registers and increasing L might take quite a few machine cycles in the worst case. We will see later that MMIX is able to take care of any high-priority interrupts that might occur during this time.)

- PUSHJ \$X, @+4\*YZ[-262144] 'push registers and jump'.
- PUSHGO \$X,\$Y,\$Z|Z 'push registers and go'.

Suppose first that X < L. Register X is set equal to the number X, then registers 0, 1, ..., X are pushed onto the register stack as described below. If this instruction is in location  $\lambda$ , the value  $\lambda + 4$  is placed into the special return-jump register rJ. Then control jumps to instruction  $\lambda + 4YZ$  or  $\lambda + 4YZ - 262144$  or Y + Z or Y + Z, as in a JMP or GO command.

Pushing the first X + 1 registers onto the stack means essentially that we set  $S[\tau] \leftarrow \$0$ ,  $S[\tau+1] \leftarrow \$1$ , ...,  $S[\tau+X] \leftarrow \$X$ ,  $\tau \leftarrow \tau + X + 1$ ,  $\$0 \leftarrow \$(X+1)$ , ...,  $\$(L-X-2) \leftarrow \$(L-1)$ ,  $L \leftarrow L-X-1$ . For example, if X = 1 and L=5, the current contents of \$0 and the number 1 are placed on the register stack, where they will be temporarily inaccessible. Then control jumps to a subroutine with L reduced to 3; the registers that we had been calling \$2, \$3, and \$4 appear as \$0, \$1, and \$2 to the subroutine.

If  $L \leq X < G$ , the value of L increases to X + 1 as described above; then the rules for X < L apply.

If  $X \ge G$  the actions are similar, except that *all* of the local registers  $\$0, \ldots, \$(L-1)$  are placed on the register stack followed by the number L, and L is reset to zero. In particular, the instruction PUSHGO \$255,\$Y,\$Z pushes all the local registers onto the stack and sets L to zero, regardless of the previous value of L.

We will see later that MMIX is able to achieve the effect of pushing and renaming local registers without actually doing very much work at all.

• POP X,YZ 'pop registers and return from subroutine'.

This command preserves X of the current local registers, undoes the effect of the most recent PUSHJ or PUSHGO, and jumps to the instruction in  $M_4[4YZ+rJ]$ . If X>0, the value of (X-1) goes into the "hole" position where PUSHJ or PUSHGO stored the number of registers previously pushed.

The formal details of POP are slightly complicated, but we will see that they make sense: If X > L, we first replace X by L+1. Then we set  $x \leftarrow S[\tau-1] \mod 256$ ; this is the effective value of the X field in the push instruction that is being undone. Stack position  $S[\tau-1]$  is now set to \$(X-1) if  $0 < X \le L$ , otherwise it is set to zero. Then we essentially set  $L \leftarrow \min(x+X,G), \$(L-1) \leftarrow \$(L-x-2), \ldots, \$(x+1) \leftarrow \$0, \$x \leftarrow S[\tau-1], \ldots, \$0 \leftarrow S[\tau-x-1], \tau \leftarrow \tau-x-1$ . The operating system should arrange things so that a memory-protection interrupt will occur if a program does more pops than pushes. (If x > G, these formulas don't make sense as written; we actually set  $\$j \leftarrow S[\tau-x-1+j]$  for  $L > j \ge 0$  in that rare case.)

Suppose, for example, that a subroutine has three input parameters (\$0,\$1,\$2) and produces two outputs (\$0,\$1). If the subroutine does not call any other subroutines, it can simply end with POP 2,0, because rJ will contain the return address. Otherwise it should begin by saving rJ, for example with the instruction GET \$4,rJ if it will be using local registers \$0 through \$3, and it should use PUSHJ \$5 or PUSHGO \$5 when calling sub-subroutines; finally it should PUT rJ,\$4 before saying POP 2,0. To call the subroutine from another routine that has, say, 6 local registers, we would put the input arguments into \$7, \$8, and \$9, then issue the command PUSHGO \$6,base,Subr; in due time the outputs of the subroutine will appear in \$7 and \$6.

Notice that the push and pop commands make use of a one-place "hole" in the register stack, between the registers that are pushed down and the registers that remain local. (The hole is position \$6 in the example just considered.) MMIX needs this hole position to remember the number of registers that are pushed down. A subroutine with no outputs ends with POP 0,0 and the hole disappears (becomes marginal). A subroutine with one output \$0 ends with POP 1,0 and the hole gets the former value of \$0. A subroutine with two outputs (\$0,\$1) ends with POP 2,0 and the hole gets the former value of \$1; in this case, therefore, the relative order of the two outputs has been switched on the register stack. If a subroutine has, say, five outputs ( $\$0,\ldots,\$4$ ), it ends with POP 5,0 and \$4 goes into the hole position, where it is followed by (\$0,\$1,\$2,\$3). MMIX makes this curious permutation in the case of multiple outputs because the hole is most easily plugged by moving one value down (namely \$4) instead of by sliding each of five values down in the stack.

These conventions for parameter passing are admittedly a bit confusing in the general case, and I suppose people who use them extensively might someday find themselves talking about "the infamous MMIX register shuffle." However, there is good use for subroutines that convert a sequence of register contents like (x, a, b, c) into (f, a, b, c) where f is a function of a, b, and c but not x. Moreover, PUSHGO and POP can be implemented with great efficiency, and subroutine linkage tends to be a significant bottleneck when other conventions are used.

Information about a subroutine's calling conventions needs to be communicated to a debugger. That can readily be done at the same time as we inform the debugger about the symbolic names of addresses in memory.

A subroutine that uses 50 local registers will not function properly if it is called by a program that sets G less than 50. MMIX does not allow the value of G to become less than 32. Therefore any subroutine that avoids global registers and uses at most 32 local registers can be sure to work properly regardless of the current value of G.

The rules stated above imply that a PUSHJ or PUSHGO instruction with X=255 pushes all of the currently defined local registers onto the stack and sets L to zero. This makes G local registers available for use by the subroutine jumped to. If that subroutine later returns with POP 0,0, the former value of L and the former contents of  $0, \ldots, L-1$  will be restored (assuming that G doesn't decrease).

A POP instruction with X=255 preserves all the local registers as outputs of the subroutine (provided that the total doesn't exceed G after popping), and puts zero into the hole (unless L=G=255). The best policy, however, is almost always to use POP with a small value of X, and in general to keep the value of L as small as possible by decreasing it when registers are no longer active. A smaller value of L means that MMIX can change context more easily when switching from one process to another.

- **30.** System considerations. High-performance implementations of MMIX gain speed by keeping *caches* of instructions and data that are likely to be needed as computation proceeds. [See M. V. Wilkes, *IEEE Transactions* **EC-14** (1965), 270–271; J. S. Liptay, *IBM System J.* **7** (1968), 15–21.] Careful programmers can make the computer run even faster by giving hints about how to maintain such caches.
- LDUNC \$X,\$Y,\$Z|Z 'load octa uncached'.

These instructions, which have the same meaning as LDO, also inform the computer that the loaded octabyte (and its neighbors in a cache block) will probably not be read or written in the near future.

• STUNC \$X,\$Y,\$Z|Z 'store octa uncached'.

These instructions, which have the same meaning as STO, also inform the computer that the stored octabyte (and its neighbors in a cache block) will probably not be read or written in the near future.

• PRELD X, \$Y, \$Z|Z 'preload data'.

These instructions have no effect on registers or memory, but they inform the computer that many of the X+1 bytes M[\$Y+\$Z] through M[\$Y+\$Z+X], or M[\$Y+Z] through M[\$Y+Z+X], will probably be loaded and/or stored in the near future. No protection failure occurs if the memory is not accessible.

• PREGO X, \$Y, \$Z|Z 'prefetch to go'.

These instructions have no effect on registers or memory, but they inform the computer that many of the X+1 bytes M[\$Y+\$Z] through M[\$Y+\$Z+X], or M[\$Y+Z] through M[\$Y+Z+X], will probably be used as instructions in the near future. No protection failure occurs if the memory is not accessible.

• PREST X,\$Y,\$Z|Z 'prestore data'.

These instructions have no effect on registers or memory if the computer has no data cache. But when such a cache exists, they inform the computer that all of the X+1 bytes M[\$Y+\$Z] through M[\$Y+Z+X], or M[\$Y+Z] through M[\$Y+Z+X], will definitely be stored in the near future before they are loaded. (Therefore it is permissible for the machine to ignore the present contents of those bytes. Also, if those bytes are being shared by several processors, the current processor should try to acquire exclusive access.) No protection failure occurs if the memory is not accessible.

## • SYNCD X, \$Y, \$Z|Z 'synchronize data'.

When executed from nonnegative locations, these instructions have no effect on registers or memory if neither a write buffer nor a "write back" data cache are present. But when such a buffer or cache exists, they force the computer to make sure that all data for the X+1 bytes M[\$Y+\$Z] through M[\$Y+\$Z+X], or M[\$Y+Z] through M[\$Y+Z+X], will be present in memory. (Otherwise the result of a previous store instruction might appear only in the cache; the computer is being told that now is the time to write the information back, if it hasn't already been written. A program can use this feature before outputting directly from memory.) No protection failure occurs if the memory is not accessible.

The action is similar when SYNCD is executed from a negative address, but in this case the specified bytes are also removed from the data cache (and from a secondary cache, if present). The operating system can use this feature when a page of virtual memory is being swapped out, or when data is input directly into memory.

### • SYNCID X, \$Y, \$Z|Z 'synchronize instructions and data'.

When executed from nonnegative locations these instructions have no effect on registers or memory if the computer has no instruction cache separate from a data cache. But when such a cache exists, they force the computer to make sure that the X + 1bytes M[Y+Z] through M[Y+Z+X], or M[Y+Z] through M[Y+Z+X], will be interpreted correctly if used as instructions before they are next modified. (Generally speaking, an MMIX program is not expected to store anything in memory locations that are also being used as instructions. Therefore MMIX's instruction cache is allowed to become inconsistent with respect to its data cache. Programmers who insist on executing instructions that have been fabricated dynamically, for example when setting a breakpoint for debugging, must first SYNCID those instructions in order to guarantee that the intended results will be obtained.) A SYNCID command might be implemented in several ways; for example, the machine might update its instruction cache to agree with its data cache. A simpler solution, which is good enough because the need for SYNCID ought to be rare, removes instructions in the specified range from the instruction cache, if present, so that they will have to be fetched from memory the next time they are needed; in this case the machine also carries out the effect of a SYNCD command. No protection failure occurs if the memory is not accessible.

The behavior is more drastic, but faster, when SYNCID is executed from a negative location. Then all bytes in the specified range are simply removed from all caches, and the memory corresponding to any "dirty" cache blocks involving such bytes is not brought up to date. An operating system can use this version of the command when pages of virtual memory are being discarded (for example, when a program is being terminated).

- **31.** MMIX is designed to work not only on a single processor but also in situations where several processors share a common memory. The following commands are useful for efficient operation in such circumstances.
- $\bullet$  CSWAP  $X,Y,Z\Z\$  'compare and swap octabytes'.

If the octabyte  $M_8[\$Y + \$Z]$  or  $M_8[\$Y + Z]$  is equal to the contents of the special prediction register rP, it is replaced in memory with the contents of register X, and

register X is set equal to 1. Otherwise the octabyte in memory replaces rP and register X is set to zero. This is an atomic (indivisible, uninterruptible) operation, useful for interprocess communication when independent computers are sharing the same memory.

The compare-and-swap operation was introduced by IBM in late models of the System/370 architecture, and it soon spread to several other machines. Significant ways to use it are discussed, for example, in section 7.2.3 of Harold Stone's High-Performance Computer Architecture (Reading, Massachusetts: Addison-Wesley, 1987), and in sections 8.2 and 8.3 of Transaction Processing by Jim Gray and Andreas Reuter (San Francisco: Morgan Kaufmann, 1993).

#### • SYNC XYZ 'synchronize'.

If XYZ = 0, the machine drains its pipeline (that is, it stalls until all preceding instructions have completed their activity). If XYZ = 1, the machine controls its actions less drastically, in such a way that all store instructions preceding this SYNC will be completed before all store instructions after it. If XYZ = 2, the machine controls its actions in such a way that all load instructions preceding this SYNC will be completed before all load instructions after it. If XYZ = 3, the machine controls its actions in such a way that all load or store instructions preceding this SYNC will be completed before all load or store instructions after it. If XYZ = 4, the machine goes into a power-saver mode, in which instructions may be executed more slowly (or not at all) until some kind of "wake-up" signal is received. If XYZ = 5, the machine empties its write buffer and cleans its data caches, if any (including a possible secondary cache); the caches retain their data, but the cache contents also appear in memory. If XYZ = 6, the machine clears its virtual address translation caches (see below). If XYZ = 7, the machine clears its instruction and data caches, discarding any information in the data caches that wasn't previously in memory. ("Clearing" is stronger than "cleaning"; a clear cache remembers nothing. Clearing is also faster, because it simply obliterates everything.) If XYZ > 7, an illegal instruction interrupt occurs.

Of course no SYNC is necessary between a command that loads from or stores into memory and a subsequent command that loads from or stores into exactly the same location. However, SYNC might be necessary in certain cases even on a one-processor system, because input/output processes take place in parallel with ordinary computation.

The cases XYZ > 3 are *privileged*, in the sense that only the operating system can use them. More precisely, if a SYNC command is encountered with XYZ = 4 or XYZ = 5 or XYZ = 6 or XYZ = 7, a "privileged instruction interrupt" occurs unless that interrupt is currently disabled. Only the operating system can disable interrupts (see below).

32. Trips and traps. Special register rA records the current status information about arithmetic exceptions. Its least significant byte contains eight "event" bits called DVWIOUZX from left to right, where D stands for integer divide check, V for integer overflow, W for float-to-fix overflow, I for invalid operation, O for floating overflow, U for floating underflow, Z for floating division by zero, and X for floating inexact. The next least significant byte of rA contains eight "enable" bits with the same names DVWIOUZX and the same meanings. When an exceptional condition occurs, there are two cases: If the corresponding enable bit is 0, the corresponding event bit is set to 1. But if the corresponding enable bit is 1, MMIX interrupts its current instruction stream and executes a special "exception handler." Thus, the event bits record exceptions that have not been "tripped."

Floating point overflow always causes two exceptions, O and X. (The strictest interpretation of the IEEE standard would raise exception X on overflow only if floating overflow is not enabled, but MMIX always considers an overflowed result to be inexact.) Floating point underflow always causes both U and X when underflow is not enabled, and it might cause both U and X when underflow is enabled. If both enable bits are set to 1 in such cases, the overflow or underflow handler is called and the inexact handler is ignored. All other types of exceptions arise one at a time, so there is no ambiguity about which exception handler should be invoked unless exceptions are raised by "ropcode 2" (see below); in general the first enabled exception in the list DVWIOUZX takes precedence.

What about the six high-order bytes of the status register rA? At present, only two of those 48 bits are defined; the others must be zero for compatibility with possible future extensions. The two bits corresponding to  $2^{17}$  and  $2^{16}$  in rA specify a rounding mode, as follows: 00 means round to nearest (the default); 01 means round off (toward zero); 10 means round up (toward positive infinity); and 11 means round down (toward negative infinity).

33. The execution of MMIX programs can be interrupted in several ways. We have just seen that arithmetic exceptions will cause interrupts if they are enabled; so will illegal or privileged instructions, or instructions that are emulated in software instead of provided by the hardware. Input/output operations or external timers are another common source of interrupts; the operating system knows how to deal with all gadgets that might be hooked up to an MMIX processor chip. Interrupts occur also when memory accesses fail—for example if memory is nonexistent or protected. Power failures that force the machine to use its backup battery power in order to keep running in an emergency, or hardware failures like parity errors, all must be handled as gracefully as possible.

Users can also force interrupts to happen by giving explicit TRAP or TRIP instructions:

# • TRAP X,Y,Z 'trap'; TRIP X,Y,Z 'trip'.

Both of these instructions interrupt processing and transfer control to a handler. The difference between them is that TRAP is handled by the operating system but TRIP is handled by the user. More precisely, the X, Y, and Z fields of TRAP have special significance predefined by the operating system kernel. For example, a system call—

say an I/O command, or a command to allocate more memory—might be invoked by certain settings of X, Y, and Z. The X, Y, and Z fields of TRIP, on the other hand, are definable by users for their own applications, and users also define their own handlers. "Trip handler" programs invoked by TRIP are interruptible, but interrupts are normally inhibited while a TRAP is being serviced. Specific details about the precise actions of TRIP and TRAP appear below, together with the description of another command called RESUME that returns control from a handler to the interrupted program.

Only two variants of TRAP are predefined by the MMIX architecture: If XYZ = 0 in a TRAP command, a user process should terminate. If XYZ = 1, the operating system should provide default action for cases in which the user has not provided any handler for a particular kind of interrupt (see below).

A few additional variants of TRAP are predefined in the rudimentary operating system used with MMIX simulators. These variants, which allow simple input/output operations to be done, all have X=0, and the Y field is a small positive constant. For example, Y=1 invokes the Fopen routine, which opens a file. (See the program MMIX-SIM for full details.)

34. Non-catastrophic interrupts in MMIX are always precise, in the sense that all legal instructions before a certain point have effectively been executed, and no instructions after that point have yet been executed. The current instruction, which may or may not have been completed at the time of interrupt and which may or may not need to be resumed after the interrupt has been serviced, is put into the special execution register rX, and its operands (if any) are placed in special registers rY and rZ. The address of the following instruction is placed in the special where-interrupted register rW. The instruction in rX might not be the same as the instruction in location rW -4; for example, it might be an instruction that branched or jumped to rW. It might also be an instruction inserted internally by the MMIX processor. (For example, the computer silently inserts an internal instruction that increases L before an instruction like ADD \$9,\$1,\$0 if L is currently less than 10. If an interrupt occurs, between the inserted instruction and the ADD, the instruction in rX will say ADD, because an internal instruction retains the identity of the actual command that spawned it; but rW will point to the real ADD command.)

When an instruction has the normal meaning "set \$X to the result of \$Y op \$Z" or "set \$X to the result of \$Y op Z," special registers rY and rZ will relate in the obvious way to the Y and Z operands of the instruction; but this is not always the case. For example, after an interrupted store instruction, the first operand rY will hold the virtual memory address (\$Y plus either \$Z or Z), and the second operand rZ will be the octabyte to be stored in memory (including bytes that have not changed, in cases like STB). In other cases the actual contents of rY and rZ are defined by each implementation of MMIX, and programmers should not rely on their significance.

Some instructions take an unpredictable and possibly long amount of time, so it may be necessary to interrupt them in progress. For example, the FREM instruction (floating point remainder) is extremely difficult to compute rapidly if its first operand has an exponent of 2046 and its second operand has an exponent of 1. In such cases the rY and rZ registers saved during an interrupt show the current state of the computation, not necessarily the original values of the operands. The value of rY rem rZ will still be the desired remainder, but rY may well have been reduced to a number that has an exponent closer to the exponent of rZ. After the interrupt has been processed, the remainder computation will continue where it left off. (Alternatively, an operation like FREM or even FADD might be implemented in software instead of hardware, as we will see later.)

Another example arises with an instruction like PREST (prestore), which can specify prestoring up to 256 bytes. An implementation of MMIX might choose to prestore only 32 or 64 bytes at a time, depending on the cache block size; then it can change the contents of rX to reflect the unfinished part of a partially completed PREST command.

Commands that decrease G, pop the stack, save the current context, or unsave an old context also are interruptible. Register rX is used to communicate information about partial completion in such a way that the interruption will be essentially "invisible" after a program is resumed.

MMIX: TRIPS AND TRAPS

**35.** Three kinds of interruption are possible: trips, forced traps, and dynamic traps. We will discuss each of these in turn.

A TRIP instruction puts itself into the right half of the execution register rX, and sets the 32 bits of the left half to #8000000. (Therefore rX is negative; this fact will tell the RESUME command not to TRIP again.) The special registers rY and rZ are set to the contents of the registers specified by the Y and Z fields of the TRIP command, namely \$Y and \$Z. Then \$255 is placed into the special bootstrap register rB, and \$255 is set to rJ. MMIX now takes its next instruction from virtual memory address 0.

Arithmetic exceptions interrupt the computation in essentially the same way as TRIP, if they are enabled. The only difference is that their handlers begin at the respective addresses 16, 32, 48, 64, 80, 96, 112, and 128, for exception bits D, V, W, I, O, U, Z, and X of rA; registers rY and rZ are set to the operands of the interrupted instruction as explained earlier.

A 16-byte block of memory is just enough for a sequence of commands like

PUSHJ 255, Handler; PUT rJ, \$255; GET \$255, rB; RESUME

which will invoke a user's handler. And if the user does not choose to provide a custom-designed handler, the operating system provides a default handler via the instructions

TRAP 1; GET \$255, rB; RESUME.

A trip handler might simply record the fact that tripping occurred. But the handler for an arithmetic interrupt might want to change the default result of a computation. In such cases, the handler should place the desired substitute result into rZ, and it should change the most significant byte of rX from #80 to #02. This will have the desired effect, because of the rules of RESUME explained below, *unless* the exception occurred on a command like STB or STSF. (A bit more work is needed to alter the effect of a command that stores into memory.)

Instructions in *negative* virtual locations do not invoke trip handlers, either for TRIP or for arithmetic exceptions. Such instructions are reserved for the operating system, as we will see.

**36.** A TRAP instruction interrupts the computation essentially like TRIP, but with the following modifications: (i) the interrupt mask register rK is cleared to zero, thereby inhibiting interrupts; (ii) control jumps to virtual memory address rT, not zero; (iii) information is placed in a separate set of special registers rBB, rWW, rXX, rYY, and rZZ, instead of rB, rW, rX, rY, and rZ. (These special registers are needed because a trap might occur while processing a TRIP.)

Another kind of forced trap occurs on implementations of MMIX that emulate certain instructions in software rather than in hardware. Such instructions cause a TRAP even though their opcode is something else like FREM or FADD or DIV. The trap handler can tell what instruction to emulate by looking at the opcode, which appears in rXX. In such cases the left-hand half of rXX is set to #02000000; the handler emulating FADD, say, should compute the floating point sum of rYY and rZZ and place the result in rZZ. A subsequent RESUME 1 will then place the value of rZZ in the proper register.

When a forced trap occurs on a store instruction because of memory protection failure, the settings of rYY and rZZ are undefined. They do not necessarily correspond to the virtual address rY and the octabyte to be stored rZ that are supplied to a trip handler after a tripped store instruction, because a forced trap aborts its instruction as soon as possible.

Implementations of MMIX might also emulate the process of virtual-address-to-physical-address translation described below, instead of providing for page table calculations in hardware. Then if, say, a LDB instruction does not know the physical memory address corresponding to a specified virtual address, it will cause a forced trap with the left half of rXX set to  $^{\#}03000000$  and with rYY set to the virtual address in question. The trap handler should place the physical page address into rZZ; then RESUME 1 will complete the LDB.

**37.** The third and final kind of interrupt is called a *dynamic* trap. Such interruptions occur when one or more of the 64 bits in the special *interrupt request register* rQ have been set to 1, and when at least one corresponding bit of the special *interrupt mask register* rK is also equal to 1. The bit positions of rQ and rK have the general form

| 24               | 8       | 24                | 8       |  |
|------------------|---------|-------------------|---------|--|
| low-priority I/O | program | high-priority I/O | machine |  |

where the 8-bit "program" bits are called rwxnkbsp and have the following meanings:

- r bit: instruction tries to load from a page without read permission;
- w bit: instruction tries to store to a page without write permission;
- x bit: instruction appears in a page without execute permission;
- n bit: instruction refers to a negative virtual address;
- k bit: instruction is privileged, for use by the "kernel" only;
- b bit: instruction breaks the rules of MMIX;
- s bit: instruction violates security (see below);
- p bit: instruction comes from a privileged (negative) virtual address.

Negative addresses are for the use of the operating system only; a security violation occurs if an instruction in a nonnegative address is executed without the rwxnkbsp bits of rK all set to 1. (In such cases the s bits of both rQ and rK are set to 1.)

The eight "machine" bits of rQ and rK represent the most urgent kinds of interrupts. The rightmost bit stands for power failure, the next for memory parity error, the next for nonexistent memory, the next for rebooting, etc. Interrupts that need especially quick service, like requests from a high-speed network, also are allocated bit positions near the right end. Low priority I/O devices like keyboards are assigned to bits at the left. The allocation of input/output devices to bit positions will differ from implementation to implementation, depending on what devices are available.

Once  $rQ \wedge rK$  becomes nonzero, the machine waits briefly until it can give a precise interrupt. Then it proceeds as with a forced trap, except that it uses the special "dynamic trap address register" rTT instead of rT. The trap handler that begins at location rTT can figure out the reason for interrupt by examining  $rQ \wedge rK$ . (For example, after the instructions

```
GET $0,rQ; LDOU $1,savedK; AND $0,$0,$1; SUBU $1,$0,1; SADD $2,$1,$0; ANDN $1,$0,$1
```

the highest-priority offending bit will be in \$1 and its position will be in \$2.)

If the interrupted instruction contributed 1s to any of the rwxnkbsp bits of rQ, the corresponding bits are set to 1 also in rXX. A dynamic trap handler might be able to use this information (although it should service higher-priority interrupts first if the right half of rQ  $\wedge$  rK is nonzero).

The rules of MMIX are rigged so that only the operating system can execute instructions with interrupts suppressed. Therefore the operating system can in fact use instructions that would interrupt an ordinary program. Control of register rK turns out to be the ultimate privilege, and in a sense the only important one.

An instruction that causes a dynamic trap is usually executed before the interruption occurs. However, an instruction that traps with bits  $\mathbf{x}$ ,  $\mathbf{k}$ , or  $\mathbf{b}$  does nothing; a load instruction that traps with  $\mathbf{r}$  or  $\mathbf{n}$  loads zero; a store instruction that traps with any of rwxnkbsp stores nothing.

- **38.** After a trip handler or trap handler has done its thing, it generally invokes the following command.
- RESUME Z 'resume after interrupt'; the X and Y fields must be zero.

If the Z field of this instruction is zero, MMIX will use the information found in special registers rW, rX, rY, and rZ to restart an interrupted computation. If the execution register rX is negative, it will be ignored and instructions will be executed starting at virtual address rW; otherwise the instruction in the right half of the execution register will be inserted into the program as if it had appeared in location rW -4, subject to certain modifications that we will explain momentarily, and the *next* instruction will come from rW.

If the Z field of RESUME is 1 and if this instruction appears in a negative location, registers rWW, rXX, rYY, and rZZ are used instead of rW, rX, rY, and rZ. Also, just before resuming the computation, mask register rK is set to \$255 and \$255 is set to rBB. (Only the operating system gets to use this feature.)

An interrupt handler within the operating system might choose to allow itself to be interrupted. In such cases it should save the contents of rBB, rWW, rXX, rYY, and rZZ on some kind of stack, before making rK nonzero. Then, before resuming whatever caused the base level interrupt, it must again disable all interrupts; this can be done with TRAP, because the trap handler can tell from the virtual address in rWW that it has been invoked by the operating system. Once rK is again zero, the contents of rBB, rWW, rXX, rYY, and rZZ are restored from the stack, the outer level interrupt mask is placed in \$255, and RESUME 1 finishes the job.

Values of Z greater than 1 are reserved for possible later definition. Therefore they cause an illegal instruction interrupt (that is, they set the 'b' bit of rQ) in the present version of MMIX.

If the execution register rX is nonnegative, its leftmost byte controls the way its right-hand half will be inserted into the program. Let's call this byte the "ropcode." A ropcode of 0 simply inserts the instruction into the execution stream; a ropcode of 1 is similar, but it substitutes rY and rZ for the two operands, assuming that this makes sense for the operation considered.

Ropcode 2 inserts a command that sets \$X to rZ, where X is the second byte in the right half of rX. This ropcode is normally used with forced-trap emulations, so that the result of an emulated instruction is placed into the correct register. It also uses the third-from-left byte of rX to raise any or all of the arithmetic exceptions DVWIOUZX, at the same time as rZ is being placed in \$X. Emulated instructions and explicit TRAP commands can therefore cause overflow, say, just as ordinary instructions can. (Such new exceptions may, of course, spawn a trip interrupt, if any of the corresponding bits are enabled in rA.)

Finally, ropcode 3 is the same as ropcode 0, except that it also tells MMIX to treat rZ as the page table entry for the virtual address rY. (See the discussion of virtual address translation below.) Ropcodes greater than 3 are not permitted; moreover, only RESUME 1 is allowed to use ropcode 3.

The ropcode rules in the previous paragraphs should of course be understood to involve rWW, rXX, rYY, and rZZ instead of rW, rX, rY, and rZ when the ropcode is seen by RESUME 1. Thus, in particular, ropcode 3 always applies to rYY and rZZ, never to rY and rZ.

Special restrictions must hold if resumption is to work properly: Ropcodes 0 and 3 must not insert a RESUME instruction; ropcode 1 must insert a "normal" instruction, namely one whose opcode begins with one of the hexadecimal digits #0, #1, #2, #3, #6, #7, #C, #D, or #E. (See the opcode chart below.) Some implementations may also allow ropcode 1 with SYNCD[I] and SYNCID[I], so that those instructions can conveniently be interrupted. Moreover, the destination register \$X used with ropcode 1 or 2 must not be marginal. All of these restrictions hold automatically in normal use; they are relevant only if the programmer tries to do something tricky.

Notice that the slightly tricky sequence

will execute an almost arbitrary instruction Inst as if it had been in location Loc-4, and then will jump to location Loc (assuming that Inst doesn't branch elsewhere).

**39. Special registers.** Quite a few special registers have been mentioned so far, and MMIX actually has even more. It is time now to enumerate them all, together with their internal code numbers:

```
rA.
        arithmetic status register [21];
        bootstrap register (trip) [0];
   rC, continuation register [8];
  rD. dividend register [1]:
   rE, epsilon register [2];
   rF, failure location register [22];
  rG, global threshold register [19];
  rH, himult register [3];
   rI, interval counter [12];
   rJ, return-jump register [4];
  rK, interrupt mask register [15];
   rL, local threshold register [20];
  rM,
       multiplex mask register [5];
  rN, serial number [9];
  rO, register stack offset [10];
       prediction register [23];
   rP,
  rQ,
        interrupt request register [16];
  rR, remainder register [6];
   rS,
        register stack pointer [11];
  rT.
       trap address register [13];
  rU, usage counter [17];
  rV.
        virtual translation register [18];
  rW.
        where-interrupted register (trip) [24];
  rX, execution register (trip) [25];
  rY.
        Y operand (trip) [26];
   rZ,
        Z operand (trip) [27];
 rBB, bootstrap register (trap) [7];
 rTT.
        dynamic trap address register [14];
rWW.
        where-interrupted register (trap) [28];
 rXX,
        execution register (trap) [29];
 rYY,
        Y operand (trap) [30];
 rZZ,
        Z operand (trap) [31];
```

In this list rG and rL are what we have been calling simply G and L; rC, rF, rI, rN, rO, rS, rU, and rV have not been mentioned before.

40. The *interval counter* rI decreases by 1 on every "clock pulse" of the MMIX processor. Thus if MMIX is running at 500 MHz, the interval counter decreases every 2 nanoseconds. It causes an *interval interrupt* when it reaches zero. Such interrupts can be extremely useful for "continuous profiling" as a means of studying the empirical running time of programs; see Jennifer M. Anderson, Lance M. Berc, Jeffrey Dean, Sanjay Ghemawat, Monika R. Henzinger, Shun-Tak A. Leung, Richard L. Sites,

Mark T. Vandevoorde, Carl A. Waldspurger, and William E. Weihl, *ACM Transactions on Computer Systems* **15** (1997), 357–390. The interval interrupt is achieved by setting the next-to-leftmost bit of the "machine" byte of rQ equal to 1; this is the seventh-least-significant bit.

The usage counter rU consists of three fields  $(u_p, u_m, u_c)$ , called the usage pattern  $u_p$ , the usage mask  $u_m$ , and the usage count  $u_c$ . The most significant byte of rU is the usage pattern; the next most significant byte is the usage mask; and the remaining 48 bits are the usage count. Whenever an instruction whose  $OP \wedge u_m = u_p$  has been executed, the value of  $u_c$  increases by 1 (modulo  $2^{47}$ ). Thus, for example, the OP-code chart below implies that all instructions are counted if  $u_p = u_m = 0$ ; all loads and stores are counted together with GO and PUSHGO if  $u_p = (10000000)_2$  and  $u_m = (11000000)_2$ ; all floating point instructions are counted together with fixed point multiplications and divisions if  $u_p = 0$  and  $u_m = (11100000)_2$ ; fixed point multiplications and divisions alone are counted if  $u_p = (00011000)_2$  and  $u_m = (11111100)_2$ ; completed subroutine calls are counted if  $u_p = POP$  and  $u_m = (11111111)_2$ . Instructions in negative locations, which belong to the operating system, are exceptional: They are included in the usage count only if the leading bit of  $u_c$  is 1.

Incidentally, the 64-bit counter rI can be implemented rather cheaply with only two levels of logic, using an old trick called "carry-save addition" [see, for example, G. Metze and J. E. Robertson, *Proc. International Conf. Information Processing* (Paris: 1959), 389–396]. One nice embodiment of this idea is to represent a binary number x in a redundant form as the difference x'-x'' of two binary numbers. Any two such numbers can be added without carry propagation as follows: Let

$$f(x, y, z) = (x \wedge \bar{y}) \vee (x \wedge z) \vee (\bar{y} \wedge z), \qquad g(x, y, z) = x \oplus y \oplus z.$$

Then it is easy to check that x - y + z = 2f(x, y, z) - g(x, y, z); we need only verify this in the eight cases when x, y, and z are 0 or 1. Thus we can subtract 1 from a counter x' - x'' by setting

$$(x', x'') \leftarrow (f(x', x'', -1) \ll 1, g(x', x'', -1));$$

we can add 1 by setting  $(x', x'') \leftarrow (g(x'', x', -1), f(x'', x', -1) \ll 1)$ . The result is zero if and only if x' = x''. We need not actually compute the difference x' - x'' until we need to examine the register. The computation of f(x, y, z) and g(x, y, z) is particularly simple in the special cases z = 0 and z = -1. A similar trick works for rU, but extra care is needed in that case because several instructions might finish at the same time. (Thanks to Frank Yellin for his improvements to this paragraph.)

41. The special serial number register rN is permanently set to the time this particular instance of MMIX was created (measured as the number of seconds since 00:00:00 Greenwich Mean Time on 1 January 1970), in its five least significant bytes. The three most significant bytes are permanently set to the version number of the MMIX architecture that is being implemented together with two additional bytes that modify the version number. This quantity serves as an essentially unique identification number for each copy of MMIX.

Version 1.0.0 of the architecture is described in the present document. Version 1.0.1 is similar, but simplified to avoid the complications of pipelines and operating systems. Other versions may become necessary in the future.

**42.** The register stack offset rO and register stack pointer rS are especially interesting, because they are used to implement MMIX's register stack  $S[0], S[1], S[2], \ldots$ 

The operating system initializes a register stack by assigning a large area of virtual memory to each running process, beginning at an address like #600000000000000. If this starting address is  $\sigma$ , stack entry S[k] will go into the octabyte  $M_8[\sigma+8k]$ . Stack underflow will be detected because the process does not have permission to read from  $M[\sigma-1]$ . Stack overflow will be detected because something will give out—either the user's budget or the user's patience or the user's swap space—long before  $2^{61}$  bytes of virtual memory are filled by a register stack.

The MMIX hardware maintains the register stack by having two banks of 64-bit general-purpose registers, one for globals and one for locals. The global registers g[32], g[33], ..., g[255] are used for register numbers that are  $\geq G$  in MMIX commands; recall that G is always 32 or more. The local registers come from another array that contains  $2^n$  registers for some n where  $8 \leq n \leq 10$ ; for simplicity of exposition we will assume that there are exactly 512 local registers, but there may be only 256 or there may be 1024.

The local register slots  $l[0], l[1], \ldots, l[511]$  act as a cyclic buffer with addresses that wrap around mod 512, so that l[512] = l[0], l[513] = l[1], etc. This buffer is divided into three parts by three pointers, which we will call  $\alpha, \beta$ , and  $\gamma$ .



Registers  $l[\alpha]$ ,  $l[\alpha+1]$ , ...,  $l[\beta-1]$  are what program instructions currently call \$0, \$1, ..., \$(L-1); registers  $l[\beta]$ ,  $l[\beta+1]$ , ...,  $l[\gamma-1]$  are currently unused; and registers  $l[\gamma]$ ,  $l[\gamma+1]$ , ...,  $l[\alpha-1]$  contain items of the register stack that have been pushed down but not yet stored in memory. Special register rS holds the virtual memory address where  $l[\gamma]$  will be stored, if necessary. Special register rO holds the address

where  $l[\alpha]$  will be stored; this always equals  $8\tau$  plus the address of S[0]. We can deduce the values of  $\alpha$ ,  $\beta$ , and  $\gamma$  from the contents of rL, rO, and rS, because

$$\alpha = (rO/8) \mod 512$$
,  $\beta = (\alpha + rL) \mod 512$ , and  $\gamma = (rS/8) \mod 512$ .

To maintain this situation we need to make sure that the pointers  $\alpha$ ,  $\beta$ , and  $\gamma$  never move past each other. A PUSHJ or PUSHGO operation simply advances  $\alpha$  toward  $\beta$ , so it is very simple. The first part of a POP operation, which moves  $\beta$  toward  $\alpha$ , is also very simple. But the next part of a POP requires  $\alpha$  to move downward, and memory accesses might be required. MMIX will decrease rS by 8 (thereby decreasing  $\gamma$  by 1) and set  $l[\gamma] \leftarrow M_8[rS]$ , one or more times if necessary, to keep  $\alpha$  from decreasing past  $\gamma$ . Similarly, the operation of increasing L may cause MMIX to set  $M_8[rS] \leftarrow l[\gamma]$  and increase rS by 8 (thereby increasing  $\gamma$  by 1) one or more times, to keep  $\beta$  from increasing past  $\gamma$ . (Actually  $\beta$  is never allowed to increase to the point where it becomes equal to  $\gamma$ .) If many registers need to be loaded or stored at once, these operations are interruptible.

[A somewhat similar scheme was introduced by David R. Ditzel and H. R. McLellan in SIGPLAN Notices 17,4 (April 1982), 48–56, and incorporated in the so-called CRISP architecture developed at AT&T Bell Labs. An even more similar scheme was adopted in the late 1980s by Advanced Micro Devices, in the processors of their Am29000 series—a family of computers whose instructions have essentially the format 'OP X Y Z' used by MMIX.]

Limited versions of MMIX, having fewer registers, can also be envisioned. For example, we might have only 32 local registers  $l[0], l[1], \ldots, l[31]$  and only 32 global registers  $g[224], g[225], \ldots, g[255]$ . Such a machine could run any MMIX program that maintains the inequalities L < 32 and  $G \ge 224$ .

- **43.** Access to MMIX's special registers is obtained via the GET and PUT commands.
- GET \$X, Z 'get from special register'; the Y field must be zero.

Register X is set to the contents of the special register identified by its code number Z, using the code numbers listed earlier. An illegal instruction interrupt occurs if  $Z \ge 32$ .

Every special register is readable; MMIX does not keep secrets from an inquisitive user. But of course only the operating system is allowed to change registers like rK and rQ (the interrupt mask and request registers). And not even the operating system is allowed to change rN (the serial number) or the stack pointers rO and rS.

• PUT X, \$Z|Z 'put into special register'; the Y field must be zero.

The special register identified by X is set to the contents of register Z or to the unsigned byte Z itself, if permissible. Some changes are, however, impermissible: Bits of rA that are always zero must remain zero; the leading seven bytes of rG and rL must remain zero, and rL must not exceed rG; special registers 9–11 (namely rN, rO, and rS) must not change; special registers 8 and 12–18 (namely rC, rI, rK, rQ, rT, rU, rV, and rTT) can be changed only if the privilege bit of rK is zero; and certain bits of rQ (depending on available hardware) might not allow software to change them from 0 to 1. Moreover, any bits of rQ that have changed from 0 to 1 since the most recent GET x,rQ will remain 1 after PUT rQ,z. The PUT command will not increase rL; it sets rL to the minimum of the current value and the new value. (A program should say SETL \$99,0 instead of PUT rL,100 when rL is known to be less than 100.)

Impermissible PUT commands cause an illegal instruction interrupt, or (in the case of rC, rI, rK, rQ, rT, rU, rV, and rTT) a privileged operation interrupt.

• SAVE \$X,0 'save process state'; UNSAVE 0,\$Z 'restore process state'; the Y field must be 0, and so must the Z field of SAVE, the X field of UNSAVE.

The SAVE instruction stores all registers and special registers that might affect the computation of the currently running process. First the current local registers  $\$0, \$1, \ldots, \$(L-1)$  are pushed down as in PUSHGO \$255, and L is set to zero. Then the current global registers  $\$G, \$(G+1), \ldots, \$255$  are placed above them in the register stack; finally rB, rD, rE, rH, rJ, rM, rR, rP, rW, rX, rY, and rZ are placed at the very top, followed by registers rG and rA packed into eight bytes:

| 8  | 24 | 32 |
|----|----|----|
| rG | 0  | rA |

The address of the topmost octabyte is then placed in register X, which must be a global register. (This instruction is interruptible. If an interrupt occurs while the registers are being saved, we will have  $\alpha = \beta = \gamma$  in the ring of local registers; thus rO will equal rS and rL will be zero. The interrupt handler essentially has a new register stack, starting on top of the partially saved context.) Immediately after a SAVE the values of rO and rS are equal to the location of the first byte following the stack just saved. The current register stack is effectively empty at this point; thus one shouldn't do a POP until this context or some other context has been unsaved.

The UNSAVE instruction goes the other way, restoring all the registers when given an address in register Z that was returned by a previous SAVE. Immediately after an UNSAVE the values of rO and rS will be equal. Like SAVE, this instruction is interruptible.

The operating system uses SAVE and UNSAVE to switch context between different processes. It can also use UNSAVE to establish suitable initial values of rO and rS. But a user program that knows what it is doing can in fact allocate its own register stack or stacks and do its own process switching.

Caution: UNSAVE is destructive, in the sense that a program can't reliably UNSAVE twice from the same saved context. Once an UNSAVE has been done, further operations are likely to change the memory record of what was saved. Moreover, an interrupt during the middle of an UNSAVE may have already clobbered some of the data in memory before the UNSAVE has completely finished, although the data will appear properly in all registers.

- 44. Virtual and physical addresses. Virtual 64-bit addresses are converted to physical addresses in a manner governed by the special virtual translation register rV. Thus M[A] really refers to m[ $\phi(A)$ ], where m is the physical memory array and  $\phi(A)$  is determined by the physical mapping function  $\phi$ . The details of this conversion are rather technical and of interest mainly to the operating system, but two simple rules are important to ordinary users:
- Negative addresses are mapped directly to physical addresses, by simply suppressing the sign bit:

$$\phi(A) = A + 2^{63} = A \wedge \text{\#7fffffffffffff}, \quad \text{if } A < 0.$$

All accesses to negative addresses are privileged, for use by the operating system only. (Thus, for example, the trap addresses in rT and rTT should be negative, because they are addresses inside the operating system.) Moreover, all physical addresses  $\geq 2^{48}$  are intended for use by memory-mapped I/O devices; values read from or written to such locations are never placed in a cache.

• Nonnegative addresses belong to four *segments*, depending on whether the three leading bits are 000, 001, 010, or 011. These  $2^{61}$ -byte segments are traditionally used for a program's text, data, dynamic memory, and register stack, respectively, but such conventions are not mandatory. There are four mappings  $\phi_0$ ,  $\phi_1$ ,  $\phi_2$ , and  $\phi_3$  of 61-bit addresses into 48-bit physical memory space, one for each segment:

$$\phi(A) = \phi_{\lfloor A/2^{61} \rfloor}(A \mod 2^{61}), \quad \text{if } 0 \le A < 2^{63}.$$

In general, the machine is able to access smaller addresses of a segment more efficiently than larger addresses. Thus a programmer should let each segment grow upward from zero, trying to keep any of the 61-bit addresses from becoming larger than necessary, although arbitrary addresses are legal.

- **45.** Now it's time for the technical details of virtual address translation. The mappings  $\phi_0$ ,  $\phi_1$ ,  $\phi_2$ , and  $\phi_3$  are defined by the following rules.
- (1) The first two bytes of rV are four nybbles called  $b_1$ ,  $b_2$ ,  $b_3$ ,  $b_4$ ; we also define  $b_0 = 0$ . Segment i has at most  $1024^{b_{i+1}-b_i}$  pages. In particular, segment i must have at most one page when  $b_i = b_{i+1}$ , and it must be entirely empty if  $b_i > b_{i+1}$ .
- (2) The next byte of rV, s, specifies the current page size, which is  $2^s$  bytes. We must have  $s \ge 13$  (hence at least 8192 bytes per page). Values of s larger than, say, 20 or so are of use only in rather large programs that will reside in main memory for long periods of time, because memory protection and swapping are applied to entire pages. The maximum legal value of s is 48.
- (3) The remaining five bytes of rV are a 27-bit root location r, a 10-bit address space number n, and a 3-bit function field f:

|      | 4     | 4     | 4     | 4     | 8 | 27 | 10 | 3 |
|------|-------|-------|-------|-------|---|----|----|---|
| rV = | $b_1$ | $b_2$ | $b_3$ | $b_4$ | s | r  | n  | f |

Normally f = 0; if f = 1, virtual address translation will be done by software instead of hardware, and the  $b_1$ ,  $b_2$ ,  $b_3$ ,  $b_4$ , and r fields of rV will be ignored by the hardware.

(Values of f > 1 are reserved for possible future use; if f > 1 when MMIX tries to translate an address, a memory-protection failure will occur.)

(4) Each page has an 8-byte page table entry (PTE), which looks like this:

| _     | 16 | 48-s s | <b>-</b> 1 | 3 10 | 3 |
|-------|----|--------|------------|------|---|
| PTE = | x  | a      | y          | n    | p |

Here x and y are ignored (thus they are usable for any purpose by the operating system);  $2^s a$  is the physical address of byte 0 on the page; and n is the address space number (which must match the number in rV). The final three bits are the protection bits  $p_r p_w p_x$ ; the user needs  $p_r = 1$  to load from this page,  $p_w = 1$  to store on this page, and  $p_x = 1$  to execute instructions on this page. If n fails to match the number in rV, or if the appropriate protection bit is zero, a memory-protection fault occurs.

Page table entries should be writable only by the operating system. The 16 ignored bits of x imply that physical memory size is limited to  $2^{48}$  bytes (namely 256 large terabytes); that should be enough capacity for awhile, if not for the entire new millennium.

(5) A given 61-bit address A belongs to page  $|A/2^s|$  of its segment, and

$$\phi_i(A) = 2^s a + (A \bmod 2^s)$$

if a is the address in the PTE for page  $\lfloor A/2^s \rfloor$  of segment i.

(6) Suppose  $\lfloor A/2^s \rfloor$  is equal to  $(a_4a_3a_2a_1a_0)_{1024}$  in the radix-1024 number system. In the common case  $a_4 = a_3 = a_2 = a_1 = 0$ , the PTE is simply the octabyte  $m_8[2^{13}(r+b_i)+8a_0]$ ; this rule defines the mapping for the first 1024 pages. The next million or so pages are accessed through an auxiliary page table pointer

| 1       | 50 | 10 | 3 |
|---------|----|----|---|
| PTP = 1 | c  | n  | q |

in  $m_8[2^{13}(r+b_i+1)+8a_1]$ ; here the sign must be 1 and the *n*-field must match rV, but the *q* bits are ignored. The desired PTE for page  $(a_1a_0)_{1024}$  is then in  $m_8[2^{13}c+8a_0]$ . The next billion or so pages, namely the pages  $(a_2a_1a_0)_{1024}$  with  $a_2 \neq 0$ , are accessed similarly, through an auxiliary PTP at level two; and so on.

Notice that if  $b_3 = b_4$ , there is just one page in segment 3, and its PTE appears all alone in physical location  $2^{13}(r+b_3)$ . Otherwise the PTEs appear in 1024-octabyte blocks. We usually have  $0 < b_1 < b_2 < b_3 < b_4$ , but the null case  $b_1 = b_2 = b_3 = b_4 = 0$  is worthy of mention: In this special case there is only one page, and the segment bits of a virtual address are ignored; the other 61 - s bits of each virtual address must be zero.

If s = 13,  $b_1 = 3$ ,  $b_2 = 2$ ,  $b_3 = 1$ , and  $b_4 = 0$ , there are at most  $2^{30}$  pages of 8192 bytes each, all belonging to segment 0. This is essentially the virtual memory setup in the Alpha 21064 computers with DIGITAL UNIX <sup>TM</sup>.

Several special cases have weird behavior, which probably isn't going to be useful. But I might as well mention them so that the flexibility of this scheme is clarified: If, for example,  $b_1 = 2$ ,  $b_2 = b_3 = 1$ , and  $b_4 = 5$ , then r + 1 is used both for PTPs of segment 0 and PTEs of segment 2. And if  $b_2 = b_3 < b_4$ , then  $r + b_2$  is used for the PTE of page 0 segments 2 and 3; page 1 of segment 2 is not allowed, but there is a page 1 in segment 3.

I know these rules look extremely complicated, and I sincerely wish I could have found an alternative that would be both simple and efficient in practice. I tried various schemes based on hashing, but came to the conclusion that "trie" methods such as those described here are better for this application. Indeed, the page tables in most contemporary computers are based on very similar ideas, but with significantly smaller virtual addresses and without the shortcut for small page numbers. I tried also to find formats for rV and the page tables that would match byte boundaries in a more friendly way, but the corresponding page sizes did not work well. Fortunately these grungy details are almost always completely hidden from ordinary users.

Stack overflow presents a potential problem: If  $\gamma$  increases to a virtual address on a new page for which there is no permission to write, the protection interrupt handler would have no stack space in which to work! Therefore MMIX has a continuation register rC, which contains the physical address of a "continuation page." Pushed-down information is written to the continuation page until MMIX comes to an instruction that is safely interruptible. Then a stack overflow interrupt occurs, and the operating system can restore order. The format of rC is just like an ordinary PTE entry, except that the n field is ignored.

**46.** Of course MMIX can't afford to perform a lengthy calculation of physical addresses every time it accesses memory. The machine therefore maintains a *translation cache* (TC), which contains the translations of recently accessed pages. (In fact, there usually are two such caches, one for instructions and one for data.) A TC holds a set of 64-bit translation keys

| 1 2 | 61 - s | s-13 | 10 | 3 |
|-----|--------|------|----|---|
| 0 i | v      | 0    | n  | 0 |

associated with 38-bit translations

| 48-s | s-1 | .33 |
|------|-----|-----|
| a    | 0   | p   |

representing the relevant parts of the PTE for page v of segment i. Different processes typically have different values of n, and possibly also different values of s. The operating system needs a way to keep such caches up to date when pages are being allocated, moved, swapped, or recycled. The operating system also likes to know which pages have been recently used. The LDVTS instructions facilitate such operations:

#### • LDVTS \$X,\$Y,\$Z|Z 'load virtual translation status'.

The sum Y + Z or Y + Z should have the form of a translation cache key as above, except that the rightmost three bits need not be zero. If this key is present in a TC, the rightmost three bits replace the current protection code p; however, if p is thereby set to zero, the key is removed from the TC. Register X is set to 0 if the key was not present in any translation cache, or to 1 if the key was present in the TC for instructions, or to 2 if the key was present in the TC for data, or to 3 if the key was present in both. This instruction is for the operating system only. (Changes to the TC are not immediate; so SYNC and/or SYNCD ought to be done when appropriate, as discussed in MMIX-PIPE.)

47. We mentioned earlier that cheap versions of MMIX might calculate the physical addresses with software instead of hardware, using forced traps when the operating system needs to do page table calculations. Here is some code that could be used for such purposes; it defines the translation process precisely, given a nonnegative virtual address in register rYY. First we must unpack the fields of rV and compute the relevant base addresses for PTEs and PTPs:

```
GET
       virt.rYY
GET
       $7,rV
                       % $7=(virtual translation register)
SRU
       $1, virt, 61
                       % $1=i (segment number of virtual address)
       $1,$1,2
SLU
NEG
       $1,52,$1
                       % $1=52-4i
SRU
       $1,$7,$1
SLU
       $2,$1,4
SETL
       $0,#f000
AND
       $1,$1,$0
                       % $1=b[i]<<12
AND
       $2,$2,$0
                       % $2=b[i+1]<<12
SLU
       $3,$7,24
SRU
       $3,$3,37
SLU
       $3,$3,13
                       % $3=(r field of rV)
OR.H
       $3,#8000
                       % make $3 a physical address
2ADDU base,$1,$3
                      % base=address of first page table
2ADDU
                      % limit=address after last page table
       limit, $2,$3
SRU
       s, $7, 40
AND
       s,s,#ff
                      % s=(s field of rV)
CMP
       $0,s,13
BN
       $0,Fail
                       % s must be 13 or more
CMP
       $0,s,49
BNN
       $0,Fail
                       % s must be 48 or less
       mask,#8000
SETH
ORT.
       mask,#1ff8
                       % mask=(sign bit and n field)
OR.H
       $7,#8000
                       % set sign bit for PTP validation below
ANDNH virt, #e000
                      % zero out the segment number
SRU
       $0, virt, s
                       % $0=a4a3a2a1a0 (page number of virt)
ZSZ
       $1,$0,1
                       % $1=[page number is zero]
ADD
       limit, limit, $1 % increase limit if page number is zero
SETL
       $6,#3ff
```

The next part of the routine finds the "digits" of the page number  $(a_4a_3a_2a_1a_0)_{1024}$ , from right to left:

```
CMP $5, base, limit; SRU $1, $0, 10; PBZ $1,1F

AND $0, #3ff; INCL base, #2000

CMP $5, base, limit; SRU $2, $1,10; PBZ $2,2F

AND $1, #3ff; INCL base, #2000

CMP $5, base, limit; SRU $3, $2,10; PBZ $3,3F

AND $2, #3ff; INCL base, #2000

CMP $5, base, limit; SRU $4, $3,10; PBZ $4,4F

AND $3, #3ff; INCL base, #2000
```

Then the process cascades back through PTPs.

```
CMP $5,base,limit

BNN $5,Fail; 8ADDU $6,$4,base; LDO base,$6,0

XOR $6,base,$7; AND $6,$6,mask; BNZ $6,Fail

ANDNL base,#1fff

4H BNN $5,Fail; 8ADDU $6,$3,base; LDO base,$6,0

XOR $6,base,$7; AND $6,$6,mask; BNZ $6,Fail

ANDNL base,#1fff

3H BNN $5,Fail; 8ADDU $6,$2,base; LDO base,$6,0

XOR $6,base,$7; AND $6,$6,mask; BNZ $6,Fail

ANDNL base,#1fff

2H BNN $5,Fail; 8ADDU $6,$1,base; LDO base,$6,0

XOR $6,base,$7; AND $6,$6,mask; BNZ $6,Fail
```

Finally we obtain the PTE and communicate it to the machine. If errors have been detected, we set the translation to zero; actually any translation with permission bits zero would have the same effect.

```
ANDNL
             base, #1fff
                            % remove low 13 bits of PTP
1H
      BNN
             $5,Fail
      8ADDU
             $6,$0,base
      LDO
             base, $6,0
                            % base=PTE
      XOR
             $6,base,$7
      ANDN
             $6,$6,#7
      SLU
             $6,$6,51
      PBZ
             $6, Ready
                            % branch if n matches
     SETL
                            % errors lead to PTE of zero
Fail
             base,0
Ready PUT
             rZZ,base
      LDO
             $255, IntMask
                            % load the desired setting of rK
      RESUME 1
                            % now the machine will digest the translation
```

All loads and stores in this program deal with negative virtual addresses. This effectively shuts off memory mapping and makes the page tables inaccessible to the user.

The program assumes that the ropcode in rXX is 3 (which it is when a forced trap is triggered by the need for virtual translation).

The translation from virtual pages to physical pages need not actually follow the rules for PTPs and PTEs; any other mapping could be substituted by operating systems with special needs. But people usually want compatibility between different implementations whenever possible. The only parts of rV that MMIX really needs are the s field, which defines page sizes, and the n field, which keeps TC entries of one process from being confused with the TC entries of another.

- 48. The complete instruction set. We have now described all of MMIX's special registers—except one: The special failure location register rF is set to a physical memory address when a parity error or other memory fault occurs. (The instruction leading to this error will probably be long gone before such a fault is detected; for example, the machine might be trying to write old data from a cache in order to make room for new data. Thus there is generally no connection between the current virtual program location rW and the physical location of a memory error. But knowledge of the latter location can still be useful for hardware repair, or when an operating system is booting up.)
- **49.** One additional instruction proves to be useful.
- SWYM X, Y, Z 'sympathize with your machinery'.

This command lubricates the disk drives, fans, magnetic tape drives, laser printers, scanners, and any other mechanical equipment hooked up to MMIX, if necessary. Fields X, Y, and Z are ignored.

The SWYM command was originally included in MMIX's repertoire because machines occasionally need grease to keep in shape, just as human beings occasionally need to swim or do some other kind of exercise in order to maintain good muscle tone. But in fact, SWYM has turned out to be a "no-op," an instruction that does nothing at all; the hypothetical manufacturers of our hypothetical machine have pointed out that modern computer equipment is already well oiled and sealed for permanent use. Even so, a no-op instruction provides a good way for software to send signals to the hardware, for such things as scheduling the way instructions are issued on superscalar superpipelined buzzword-compliant machines. Software programs can also use no-ops to communicate with other programs like symbolic debuggers.

When a forced trap computes the translation rZZ of a virtual address rYY, ropcode 3 of RESUME 1 will put (rYY, rZZ) into the TC for instructions if the opcode in rXX is SWYM; otherwise (rYY, rZZ) will be put into the TC for data.

**50.** The running time of MMIX programs depends to a great extent on changes in technology. MMIX is a mythical machine, but its mythical hardware exists in cheap, slow versions as well as in costly high-performance models. Details of running time usually depend on things like the amount of main memory available to implement virtual memory, as well as the sizes of caches and other buffers.

For practical purposes, the running time of an MMIX program can often be estimated satisfactorily by assigning a fixed cost to each operation, based on the approximate running time that would be obtained on a high-performance machine with lots of main memory; so that's what we will do. Each operation will be assumed to take an integer number of v, where v (pronounced "oops") is a unit that represents the clock cycle time in a pipelined implementation. The value of v will probably decrease from year to year, but I'll keep calling it v. The running time will also depend on the number of memory references or mems that a program uses; this is the number of load and store instructions. For example, each LDO (load octa) instruction will be assumed to cost  $\mu + v$ , where  $\mu$  is the average cost of a memory reference. The total running time of a program might be reported as, say,  $35\mu + 1000v$ , meaning 35 mems plus 1000 oops. The ratio  $\mu/v$  will probably increase with time, so mem-counting is

likely to become increasingly important. [See the discussion of mems in *The Stanford GraphBase* (New York: ACM Press, 1994).]

Integer addition, subtraction, and comparison all take just 1v. The same is true for SET, GET, PUT, SYNC, and SWYM instructions, as well as bitwise logical operations, shifts, relative jumps, comparisons, conditional assignments, and correctly predicted branches-not-taken or probable-branches-taken. Mispredicted branches or probable branches cost 3v, and so do the POP and GO commands. Integer multiplication takes 10v; integer division weighs in at 60v. TRAP, TRIP, and RESUME cost 5v each.

Most floating point operations have a nominal running time of 4v, although the comparison operators FCMP, FEQL, and FUN need only 1v. FDIV and FSQRT cost 40v each. The actual running time of floating point computations will vary depending on the operands; for example, the machine might need one extra v for each subnormal input or output, and it might slow down greatly when trips are enabled. The FREM instruction might typically  $\cos t (3+\delta)v$ , where  $\delta$  is the amount by which the exponent of the first operand exceeds the exponent of the second (or zero, if this amount is negative). A floating point operation might take only 1v if at least one of its operands is zero, infinity, or NaN. However, the fixed values stated at the beginning of this paragraph will be used for all seat-of-the-pants estimates of running time, since we want to keep the estimates as simple as possible without making them terribly out of line.

All load and store operations will be assumed to cost  $\mu + v$ , except that CSWAP costs  $2\mu + 2v$ . (This applies to all OP codes that begin with #8, #9, #A, and #B, except #98-#9F and #B8-#BF. It's best to keep the rules simple, because  $\mu$  is just an approximate device for estimating average memory cost.) SAVE and UNSAVE are charged  $20\mu + v$ .

Of course we must remember that these numbers are very rough. We have not included the cost of fetching instructions from memory. Furthermore, an integer multiplication or division might have an effective cost of only 1v, if the result is not needed while other numbers are being calculated. Only a detailed simulation can be expected to be truly realistic.

**51.** If you think that MMIX has plenty of operation codes, you are right; we have now described them all. Here is a chart that shows their numeric values:

|       | #O      | #1              | #2       | #3     | #4       | #5            | #6        | #7    |      |  |     |
|-------|---------|-----------------|----------|--------|----------|---------------|-----------|-------|------|--|-----|
| -11 - | TRAP    | FCMP            | FUN      | FEQL   | FADD     | FIX           | FSUB      | FIXU  | -# - |  |     |
| #0x   | FLOT[I] |                 | FLOTU[I] |        | SFLOT[I] |               | SFLOTU[I] |       | #0x  |  |     |
| #4    | FMUL    | FCMPE           | FUNE     | FEQLE  | FDIV     | FSQRT         | FREM      | FINT  | # 4  |  |     |
| #1x   | MUL[I]  |                 | MULU[I]  |        | DIV[I]   |               | DIVU      | [I]   | #1x  |  |     |
| #0    | AD      | D[I]            | ADD      | U[I]   | SUE      | 3[I]          | SUBU      | [I]   | #2x  |  |     |
| #2x   | 2AD     | DU[I]           | 4ADI     | DU[I]  | 8ADI     | OU[I]         | 16ADD     | U[I]  | " ZX |  |     |
| #3x   | CM      | P[I]            | CMP      | U[I]   | NEC      | G[I]          | NEGU      | [I]   | #3x  |  |     |
| " 3X  | SI      | [I]             | SLU      | J[I]   | SR       | [I]           | SRU       | [I]   | " 3X |  |     |
| #4x   | BI      | 1[B]            | BZ[B]    |        | BP       | [B]           | BOD       | [B]   | #4x  |  |     |
| " 4X  | BN      | N[B]            | BNZ      | Z[B]   | BNI      | BNP[B] BEV[B] |           | [B]   | " 4X |  |     |
| #5x   | PB      | N[B]            | PBZ      | Z[B]   | PBI      | P[B]          | PBOD      | [B]   | #5x  |  |     |
|       | PBI     | IN [B]          | PBN      | Z[B]   | PBN      | P[B]          | PBEV      | [B]   | UA   |  |     |
| #6x   | CSN[I]  |                 | CSZ[I]   |        | CSP[I]   |               | CSOD[I]   |       | #6x  |  |     |
|       |         | NN[I]           |          | Z[I]   |          | P[I]          | CSEV      |       | OA.  |  |     |
| #7x   | ZSN[I]  |                 | ZSZ[I]   |        | ZSP[I]   |               | ZSOD[I]   |       | #7x  |  |     |
|       | ZSI     | NN[I]           | ZSN      | Z[I]   | ZSN      | P[I]          | ZSEV      | [I]   | / A  |  |     |
| #8x   | LDB[I]  |                 | LDBU[I]  |        | LDW[I]   |               | LDWU[I]   |       | #8x  |  |     |
|       |         | T[I]            |          | U[I]   |          | )[I]          | LDOU      | [I]   |      |  |     |
| #9x   | LDSF[I] |                 | LDHT[I]  |        | CSWA     | AP[I]         | LDUN      | C[I]  | #9x  |  |     |
|       |         | TS[I]           |          | TD[I]  |          | GO[I]         | GO [      | -     |      |  |     |
| #Ax   | STB[I]  |                 | STBU[I]  |        |          | √[I]          | STWU[I]   |       | #Ax  |  |     |
|       |         | T[I]            |          | U[I]   |          | )[I]          | STOU      |       |      |  |     |
| #Bx   | STSF[I] |                 | Y .      |        |          | T[I]          |           | 0[I]  | STUN |  | #Bx |
|       |         | CD[I]           |          | ST[I]  |          | ID[I]         | PUSHG     |       |      |  |     |
| #Cx   |         | OR[I] ORN[I]    |          | NOR[I] |          | XOR[I]        |           | #Cx   |      |  |     |
|       |         | D[I]            |          | N[I]   |          | D[I]          | NXOR      |       |      |  |     |
| #Dx   |         | BDIF[I] WDIF[I] |          |        |          | F[I]          | ODIF[I]   |       | #Dx  |  |     |
|       |         | X[I]            |          | D[I]   |          | R[I]          | MXOR      |       |      |  |     |
| #Ex   | SETH    | SETMH           | SETML    | SETL   | INCH     | INCMH         | INCML     | INCL  | #Ex  |  |     |
|       | ORH     | ORMH            | ORML     | ORL    | ANDNH    | ANDNMH        | ANDNML    | ANDNL |      |  |     |
| #Fx   |         | P [B]           |          | HJ[B]  |          | A[B]          | PUT       |       | #Fx  |  |     |
|       | POP     | RESUME          | SAVE     | UNSAVE | SYNC     | SWYM          | GET       | TRIP  |      |  |     |
|       | #8      | #9              | # A      | #B     | # C      | #D            | #E        | #F    |      |  |     |

The notation '[I]' indicates an operation with an "immediate" variant in which the Z field denotes a constant instead of a register number. Similarly, '[B]' indicates an operation with a "backward" variant in which a relative address has a negative displacement. Simulators and other programs that need to present MMIX instructions in symbolic form will say that opcode #20 is ADD while opcode #21 is ADDI; they will say that #F2 is PUSHJ while #F3 is PUSHJB. But the MMIX assembler uses only the forms ADD and PUSHJ, not ADDI or PUSHJB.

To read this chart, use the hexadecimal digits at the top, bottom, left, and right. For example, operation code A9 in hexadecimal notation appears in the lower part of the  $^{\#}Ax$  row and in the  $^{\#}1/^{\#}9$  column; it is STTI, 'store tetrabyte immediate'.

Introduction. The subroutines below are used to simulate 64-bit MMIX arithmetic on an old-fashioned 32-bit computer—like the one the author had when he wrote MMIXAL and the first MMIX simulators in 1998 and 1999. All operations are fabricated from 32-bit arithmetic, including a full implementation of the IEEE floating point standard, assuming only that the C compiler has a 32-bit unsigned integer type.

Some day 64-bit machines will be commonplace and the awkward manipulations of the present program will look quite archaic. Interested readers who have such computers will be able to convert the code to a pure 64-bit form without difficulty, thereby obtaining much faster and simpler routines. Meanwhile, however, we can simulate the future and hope for continued progress.

This program module has a simple structure, intended to make it suitable for loading with MMIX simulators and assemblers.

```
#include <stdio.h>
#include <string.h>
#include <ctype.h>
 (Stuff for C preprocessor 2)
 typedef enum { false, true } bool;
  (Tetrabyte and octabyte type definitions 3)
  (Other type definitions 36)
  (Global variables 4)
  (Subroutines 5)
```

2. Subroutines of this program are declared first with a prototype, as in ANSI C, then with an old-style C function definition. Here are some preprocessor commands that make this work correctly with both new-style and old-style compilers.

```
\langle Stuff for C preprocessor 2\rangle \equiv
#ifdef __STDC__
#define ARGS(list) list
#else
\#define ARGS(list) ()
#endif
```

This code is used in section 1.

The definition of type **tetra** should be changed, if necessary, so that it represents an unsigned 32-bit integer.

```
\langle Tetrabyte and octabyte type definitions 3\rangle \equiv
  typedef unsigned int tetra;
     /* for systems conforming to the LP-64 data model */
  typedef struct {
     tetra h, l;
               /* two tetrabytes make one octabyte */
  } octa:
This code is used in section 1.
     #define sign\_bit ((unsigned) #80000000)
\langle \text{Global variables 4} \rangle \equiv
```

```
octa zero_octa;
                      /* zero\_octa.h = zero\_octa.l = 0 */
  octa neg\_one = \{-1, -1\}; /* neg\_one.h = neg\_one.l = -1 */
  octa inf\_octa = \{ ^{\#}7ff00000, 0 \}; /* floating point +\infty */
  octa standard\_NaN = \{ \text{\#7ff80000}, 0 \}; /* \text{ floating point NaN}(.5) */
See also sections 9, 30, 32, 69, and 75.
This code is used in section 1.
```

It's easy to add and subtract octabytes, if we aren't terribly worried about speed.

```
\langle \text{Subroutines 5} \rangle \equiv
  octa oplus ARGS((octa,octa));
  octa oplus(y, z) /* compute y + z */
       octa y, z;
  \{  octa x;
     x.h = y.h + z.h; x.l = y.l + z.l;
     if (x.l < y.l) x.h++;
     return x:
  }
  octa ominus ARGS((octa, octa));
  octa ominus(y, z) /* compute y - z */
       octa y, z;
  \{  octa x;
     x.h = y.h - z.h; \ x.l = y.l - z.l;
     if (x.l > y.l) x.h --;
     return x;
See also sections 6, 7, 8, 12, 13, 24, 25, 26, 27, 28, 29, 31, 34, 37, 38, 39, 40, 41, 44, 46, 50, 54, 60,
     61, 62, 68, 82, 85, 86, 88, 89, 91, and 93.
```

This code is used in section 1.

**6.** In the following subroutine, delta is a signed quantity that is assumed to fit in a signed tetrabyte.

```
\langle \text{Subroutines 5} \rangle + \equiv
  octa incr ARGS((octa, int));
  octa incr(y, delta) /* compute y + \delta */
        octa y;
        int delta;
  \{ \text{ octa } x; \}
     x.h = y.h; x.l = y.l + delta;
     if (delta \ge 0) {
        if (x.l < y.l) x.h++;
     } else if (x.l > y.l) x.h ---;
     return x;
  }
```

7. Left and right shifts are only a bit more difficult.

```
\langle Subroutines 5\rangle + \equiv
  octa shift_left ARGS((octa, int));
                          /* shift left by s bits, where 0 \le s \le 64 */
  octa shift\_left(y,s)
       octa y;
       int s;
     while (s > 32) y.h = y.l, y.l = 0, s = 32;
    if (s) { register tetra yhl = y.h \ll s, ylh = y.l \gg (32 - s);
       y.h = yhl + ylh; \ y.l \ll = s;
     }
    return y;
  octa shift_right ARGS((octa, int, int));
                             /* shift right, arithmetically if u = 0 */
  octa shift\_right(y, s, u)
       octa y;
       int s, u;
     while (s > 32) y.l = y.h, y.h = (u? 0: -(y.h \gg 31)), s = 32;
    if (s) { register tetra yhl = y.h \ll (32 - s), ylh = y.l \gg s;
       y.h = (u? 0: (-(y.h \gg 31)) \ll (32 - s)) + (y.h \gg s); y.l = yhl + ylh;
     }
    return y;
  }
```

**8.** Multiplication. We need to multiply two unsigned 64-bit integers, obtaining an unsigned 128-bit product. It is easy to do this on a 32-bit machine by using Algorithm 4.3.1M of Seminumerical Algorithms, with  $b = 2^{16}$ .

The following subroutine returns the lower half of the product, and puts the upper half into a global octabyte called aux.

```
\langle Subroutines 5\rangle + \equiv
   octa omult ARGS((octa, octa));
   octa omult(y, z)
        octa y, z:
     register int i, j, k;
     tetra u[4], v[4], w[8];
     register tetra t;
     octa acc:
      \langle Unpack the multiplier and multiplicand to u and v 10\rangle;
     for (j = 0; j < 4; j++) w[j] = 0;
     for (j = 0; j < 4; j++)
        if (\neg v[j]) w[j+4] = 0;
        else {
           for (i = k = 0; i < 4; i++) {
              t = u[i] * v[j] + w[i + j] + k;
              w[i+j] = t \& #ffff, k = t \gg 16;
           w[j+4] = k;
      \langle \text{Pack } w \text{ into the outputs } aux \text{ and } acc \text{ 11} \rangle;
     return acc;
   }
9. \langle Global variables 4 \rangle + \equiv
   octa aux:
                   /* secondary output of subroutines with multiple outputs */
   bool overflow;
                         /* set by certain subroutines for signed arithmetic */
10. (Unpack the multiplier and multiplicand to u and v_{10}) \equiv
   u[3] = y.h \gg 16, u[2] = y.h \& #fffff, u[1] = y.l \gg 16, u[0] = y.l \& #fffff;
   v[3] = z.h \gg 16, v[2] = z.h \& #fffff, v[1] = z.l \gg 16, v[0] = z.l \& #fffff;
This code is used in section 8.
11. \langle \text{Pack } w \text{ into the outputs } aux \text{ and } acc \text{ 11} \rangle \equiv
   aux.h = (w[7] \ll 16) + w[6], aux.l = (w[5] \ll 16) + w[4];
   acc.h = (w[3] \ll 16) + w[2], acc.l = (w[1] \ll 16) + w[0];
This code is used in section 8.
```

12. Signed multiplication has the same lower half product as unsigned multiplication. The signed upper half product is obtained with at most two further subtractions, after which the result has overflowed if and only if the upper half is unequal to 64 copies of the sign bit in the lower half.

```
 \begin{array}{l} \langle \, \operatorname{Subroutines} \, \, 5 \, \rangle \, + \equiv \\ & \text{octa} \, \, signed\_omult \, \, \operatorname{ARGS}((\mathbf{octa}, \mathbf{octa})); \\ & \text{octa} \, \, signed\_omult(y,z) \\ & \text{octa} \, \, uc; \\ & \{ \\ & \text{octa} \, \, acc; \\ & acc = omult(y,z); \\ & \text{if} \, \, (y.h \, \& \, sign\_bit) \, \, aux = ominus(aux,z); \\ & \text{if} \, \, (z.h \, \& \, sign\_bit) \, \, aux = ominus(aux,y); \\ & \text{overflow} \, = (aux.h \neq aux.l \vee (aux.h \oplus (aux.h \gg 1) \oplus (acc.h \, \& \, sign\_bit))); \\ & \text{return} \, \, acc; \\ & \} \end{array}
```

 $\langle$  Subroutines 5 $\rangle + \equiv$ 

octa odiv(x, y, z)octa x, y, z;

octa odiv ARGS((octa, octa, octa));

13. Division. Long division of an unsigned 128-bit integer by an unsigned 64-bit integer is, of course, one of the most challenging routines needed for MMIX arithmetic. The following program, based on Algorithm 4.3.1D of Seminumerical Algorithms, computes octabytes q and r such that  $(2^{64}x + y) = qz + r$  and  $0 \le r < z$ , given octabytes x, y, and z, assuming that x < z. (If  $x \ge z$ , it simply sets q = x and r = y.) The quotient q is returned by the subroutine; the remainder r is stored in aux.

```
register int i, j, k, n, d;
      tetra u[8], v[4], q[4], mask, qhat, rhat, vh, vmh;
      register tetra t:
      octa acc;
      \langle Check that x < z; otherwise give trivial answer 14\rangle;
      \langle \text{Unpack the dividend and divisor to } u \text{ and } v \text{ 15} \rangle;
      \langle Determine the number of significant places n in the divisor v_{16}\rangle;
      (Normalize the divisor 17);
      for (j = 3; j \ge 0; j - -) \(\right) Determine the quotient digit q[j] 20\);
      (Unnormalize the remainder 18);
      \langle \operatorname{Pack} q \text{ and } u \text{ to } acc \text{ and } aux \text{ 19} \rangle;
      return acc;
   }
14. (Check that x < z; otherwise give trivial answer 14) \equiv
   if (x.h > z.h \lor (x.h \equiv z.h \land x.l \ge z.l)) {
      aux = y; return x;
This code is used in section 13.
15. (Unpack the dividend and divisor to u and v_{15}) \equiv
   u[7] = x.h \gg 16, u[6] = x.h \& #fffff, u[5] = x.l \gg 16, u[4] = x.l \& #fffff;
   u[3] = y.h \gg 16, u[2] = y.h \& #ffff, u[1] = y.l \gg 16, u[0] = y.l \& #ffff;
   v[3] = z.h \gg 16, v[2] = z.h \& #ffff, v[1] = z.l \gg 16, v[0] = z.l \& #ffff;
This code is used in section 13.
16. \langle Determine the number of significant places n in the divisor v_{16}\rangle \equiv
   for (n = 4; v[n-1] \equiv 0; n--);
This code is used in section 13.
```

We shift u and v left by d places, where d is chosen to make  $2^{15} \le v_{n-1} < 2^{16}$ . 17.  $\langle$  Normalize the divisor 17 $\rangle \equiv$ vh = v[n-1];for  $(d = 0; vh < *8000; d++, vh \ll = 1)$ ; for (j = k = 0; j < n + 4; j ++) {  $t = (u[j] \ll d) + k;$  $u[j] = t \& #ffff, k = t \gg 16;$ for (j = k = 0; j < n; j++) {  $t = (v[j] \ll d) + k;$  $v[j] = t \& #ffff, k = t \gg 16;$ vh = v[n-1];vmh = (n > 1 ? v[n - 2] : 0);This code is used in section 13.  $\langle \text{Unnormalize the remainder } 18 \rangle \equiv$  $mask = (1 \ll d) - 1;$ for  $(j = 3; j \ge n; j --) u[j] = 0;$ for (k = 0; j > 0; j --) {  $t = (k \ll 16) + u[j];$  $u[j] = t \gg d, k = t \& mask;$ This code is used in section 13. **19.**  $\langle \operatorname{Pack} q \text{ and } u \text{ to } acc \text{ and } aux \text{ 19} \rangle \equiv$  $acc.h = (q[3] \ll 16) + q[2], acc.l = (q[1] \ll 16) + q[0];$  $aux.h = (u[3] \ll 16) + u[2], aux.l = (u[1] \ll 16) + u[0];$ This code is used in section 13.  $\langle \text{ Determine the quotient digit } q[j] | 20 \rangle \equiv$ 20.  $\langle$  Find the trial quotient,  $\hat{q}$  21  $\rangle$ ;  $\langle \text{Subtract } b^j \hat{q} v \text{ from } u \text{ 22} \rangle;$  $\langle$  If the result was negative, decrease  $\hat{q}$  by 1 23 $\rangle$ ; q[j] = qhat;This code is used in section 13. **21.**  $\langle$  Find the trial quotient,  $\hat{q}$  21  $\rangle \equiv$  $t = (u[j+n] \ll 16) + u[j+n-1];$ qhat = t/vh, rhat = t - vh \* qhat;**if** (n > 1)while  $(qhat \equiv \text{\#}10000 \lor qhat * vmh > (rhat \ll 16) + u[i + n - 2])$  { qhat --, rhat += vh;if (rhat > #10000) break; This code is used in section 20.

**22.** After this step, u[j+n] will either equal k or k-1. The true value of u would be obtained by subtracting k from u[j+n]; but we don't have to fuss over u[j+n], because it won't be examined later.

```
 \langle \, \text{Subtract} \, b^j \hat{q} v \, \, \text{from} \, u \, \, 22 \, \rangle \equiv \\ \quad \text{for} \, \left( i = k = 0; \, \, i < n; \, \, i + + \right) \, \, \left\{ \\ \quad t = u[i+j] + ^\# \text{ffff0000} - k - qhat * v[i]; \\ \quad u[i+j] = t \, \& \, ^\# \text{ffff}, k = ^\# \text{ffff} - (t \gg 16); \\ \left\}
```

This code is used in section 20.

23. The correction here occurs only rarely, but it can be necessary—for example, when dividing the number #7fff8001000000000 by #800080020005.

```
 \langle \text{ If the result was negative, decrease } \hat{q} \text{ by } 1 \text{ 23} \rangle \equiv \\ \text{ if } (u[j+n] \neq k) \text{ } \{ \\ qhat --; \\ \text{ for } (i=k=0; \ i < n; \ i++) \text{ } \{ \\ t=u[i+j] + v[i] + k; \\ u[i+j] = t \text{ & \#fffff, } k=t \gg 16; \\ \} \\ \}
```

This code is used in section 20.

```
acc: octa, §13.
                                   k: register int, §13.
                                                                     rhat: tetra, §13.
aux: octa, §9.
                                  l: tetra, §3.
                                                                     t: register tetra, §13.
                                  mask: tetra, §13.
d: register int, §13.
                                                                     u: tetra [], §13.
h: tetra, §3.
                                  n: register int, §13.
                                                                     v: tetra [], §13.
                                                                     vh: tetra, §13.
i: register int, §13.
                                  q: tetra [], §13.
j: register int, §13.
                                   qhat: tetra, §13.
                                                                     vmh: tetra, §13.
```

**24.** Signed division can be reduced to unsigned division in a tedious but straightforward manner. We assume that the divisor isn't zero.

```
\langle Subroutines 5\rangle + \equiv
  octa signed_odiv ARGS((octa, octa));
  octa signed\_odiv(y, z)
       octa y, z;
     octa yy, zz, q;
     register int sy, sz;
    if (y.h \& sign\_bit) sy = 2, yy = ominus(zero\_octa, y);
    else sy = 0, yy = y;
    if (z.h \& sign\_bit) sz = 1, zz = ominus(zero\_octa, z);
    else sz = 0, zz = z;
     q = odiv(zero\_octa, yy, zz);
     overflow = false;
    switch (sy + sz) {
    case 2 + 1: aux = ominus(zero\_octa, aux);
       if (q.h \equiv sign\_bit) overflow = true;
    case 0+0: return q:
    case 2 + 0: if (aux.h \lor aux.l) aux = ominus(zz, aux);
       goto negate_q;
     case 0 + 1: if (aux.h \lor aux.l) aux = ominus(aux, zz);
     negate\_q: if (aux.h \lor aux.l) return ominus(neg\_one, q);
       else return ominus(zero_octa, q);
  }
```

25. Bit fiddling. The bitwise operators of MMIX are fairly easy to implement directly, but three of them occur often enough to deserve packaging as subroutines.

```
\langle Subroutines 5\rangle +\equiv
  octa oand ARGS((octa, octa));
  octa oand(y, z)
                       /* compute y \wedge z */
        octa y, z;
  \{  octa x;
     x.h = y.h \& z.h; x.l = y.l \& z.l;
     return x:
  octa oandn ARGS((octa, octa));
  octa oandn(y, z)
                          /* compute y \wedge \bar{z} */
        octa y, z;
  \{ \text{ octa } x; \}
     x.h = y.h \& \sim z.h; x.l = y.l \& \sim z.l;
     return x;
  octa oxor ARGS((octa, octa));
                        /* compute y \oplus z */
  octa oxor(y, z)
        octa y, z;
  \{  octa x;
     x.h = y.h \oplus z.h; \ x.l = y.l \oplus z.l;
     return x;
  }
```

**26.** Here's a fun way to count the number of bits in a tetrabyte. [This classical trick is called the "Gillies–Miller method for sideways addition" in *The Preparation of Programs for an Electronic Digital Computer* by Wilkes, Wheeler, and Gill, second edition (Reading, Mass.: Addison–Wesley, 1957), 191–193. Some of the tricks used here were suggested by Balbir Singh, Peter Rossmanith, and Stefan Schwoon.]

```
 \begin{array}{l} \langle \, {\rm Subroutines} \,\, 5 \, \rangle \, + \equiv \\ & \mbox{int} \,\, count\_bits \,\, {\rm ARGS}(({\tt tetra})); \\ & \mbox{int} \,\, count\_bits(x) \\ & \mbox{tetra} \,\, x; \\ \{ & \mbox{register int} \,\, xx = x; \\ & xx = xx - ((xx \gg 1) \,\&\,\,^\# 55555555); \\ & xx = (xx \,\&\,\,^\# 33333333) + ((xx \gg 2) \,\&\,\,^\# 33333333); \\ & xx = (xx + (xx \gg 4)) \,\&\,\,^\# 0f0f0f0f; \\ & xx = xx + (xx \gg 8); \\ & \mbox{return} \,\, (xx + (xx \gg 16)) \,\&\,\,^\# {\tt ff}; \\ \} \end{array}
```

**27.** To compute the nonnegative byte differences of two given tetrabytes, we can carry out the following 20-step branchless computation:

```
 \begin{array}{l} \langle \, {\rm Subroutines} \,\, 5 \, \rangle \, + \equiv \\ & {\rm tetra} \,\, byte\_diff \,\, {\rm ARGS}(({\rm tetra}, {\rm tetra})); \\ & {\rm tetra} \,\, byte\_diff \, (y,z) \\ & {\rm tetra} \,\, y, \,\, z; \\ \{ & \,\, {\rm register} \,\, {\rm tetra} \,\, d = (y \,\&\, {\rm \#00ff00ff}) + {\rm \#01000100} - (z \,\&\, {\rm \#00ff00ff}); \\ & {\rm register} \,\, {\rm tetra} \,\, m = d \,\&\, {\rm \#01000100}; \\ & {\rm register} \,\, {\rm tetra} \,\, x = d \,\&\, (m - (m \gg 8)); \\ & d = ((y \gg 8) \,\&\, {\rm \#00ff00ff}) + {\rm \#01000100} - ((z \gg 8) \,\&\, {\rm \#00ff00ff}); \\ & m = d \,\&\, {\rm \#01000100}; \\ & {\rm return} \,\, x + ((d \,\&\, (m - (m \gg 8))) \ll 8); \\ \} \end{array}
```

28. To compute the nonnegative wyde differences of two tetrabytes, another trick leads to a 15-step branchless computation. (Research problem: Can *count\_bits*, byte\_diff, or wyde\_diff be done with fewer operations?)

```
\begin{array}{l} \langle \, \text{Subroutines} \, \, 5 \, \rangle \, + \equiv \\ \text{tetra} \, \, wyde\_diff \, \, \text{ARGS}((\text{tetra}, \text{tetra})); \\ \text{tetra} \, \, wyde\_diff(y,z) \\ \text{tetra} \, \, y, \, \, z; \\ \{ \\ \text{register tetra} \, \, a = ((y \gg 16) - (z \gg 16)) \, \& \, ^\# 10000; \\ \text{register tetra} \, \, b = ((y \, \& \, ^\# \text{ffff}) - (z \, \& \, ^\# \text{ffff})) \, \& \, ^\# 10000; \\ \text{return} \, \, y - (z \oplus ((y \oplus z) \, \& \, (b - a - (b \gg 16)))); \\ \} \end{array}
```

tetra = unsigned int, §3.

zero\_octa: octa, §4.

29. The last bitwise subroutine we need is the most interesting: It implements MMIX's MOR and MXOR operations.

```
\langle Subroutines 5\rangle +\equiv
  octa bool_mult ARGS((octa, octa, bool));
  octa bool\_mult(y, z, xor)
                       /* the operands */
        octa y, z;
                       /* do we do xor instead of or? */
        bool xor:
     octa o, x;
     register tetra a, b, c;
     register int k;
     for (k = 0, o = y, x = zero\_octa; o.h \lor o.l; k++, o = shift\_right(o, 8, 1))
        if (o.l & #ff) {
          a = ((z.h \gg k) \& \text{#01010101}) * \text{#ff};
          b = ((z.l \gg k) \& \text{#01010101}) * \text{#ff};
          c = (o.l \& #ff) * #01010101;
          if (xor) x.h \oplus = a \& c, x.l \oplus = b \& c;
          else x.h = a \& c, x.l = b \& c;
     return x;
```

**30.** Floating point packing and unpacking. Standard IEEE floating binary numbers pack a sign, exponent, and fraction into a tetrabyte or octabyte. In this section we consider basic subroutines that convert between IEEE format and the separate unpacked components.

```
#define ROUND_OFF 1
#define ROUND_UP 2
#define ROUND_DOWN 3
#define ROUND_NEAR 4

⟨Global variables 4⟩ +=
int cur_round: /* the current rounding mode */
```

**31.** The *fpack* routine takes an octabyte f, a raw exponent e, and a sign s, and packs them into the floating binary number that corresponds to  $\pm 2^{e-1076} f$ , using a given rounding mode. The value of f should satisfy  $2^{54} < f < 2^{55}$ .

Exceptional events are noted by oring appropriate bits into the global variable exceptions. Special considerations apply to underflow, which is not fully specified by Section 7.4 of the IEEE standard: Implementations of the standard are free to choose between two definitions of "tininess" and two definitions of "accuracy loss." MMIX determines tininess after rounding, hence a result with e < 0 is not necessarily tiny; MMIX treats accuracy loss as equivalent to inexactness. Thus, a result underflows if and only if it is tiny and either (i) it is inexact or (ii) the underflow trap is enabled. The fpack routine sets U\_BIT in exceptions if and only if the result is tiny, X\_BIT if and only if the result is inexact.

```
#define X_BIT
                  (1 \ll 8)
                               /* floating inexact */
#define Z_BIT
                   (1 \ll 9)
                               /* floating division by zero */
#define U_BIT
                   (1 \ll 10)
                                 /* floating underflow */
                                 /* floating overflow */
#define O_BIT
                   (1 \ll 11)
                                /* floating invalid operation */
#define I_BIT
                   (1 \ll 12)
                                 /* float-to-fix overflow */
#define W_BIT
                   (1 \ll 13)
#define V_BIT
                   (1 \ll 14)
                                /* integer overflow */
                                 /* integer divide check */
#define D_BIT
                   (1 \ll 15)
                                 /* external (dynamic) trap bit */
#define E_BIT
                  (1 \ll 18)
\langle \text{Subroutines 5} \rangle + \equiv
  octa fpack ARGS((octa, int, char, int));
  octa fpack(f, e, s, r)
                    /* the normalized fraction part */
       octa f;
                  /* the raw exponent */
       int e;
                    /* the sign */
       char s;
                  /* the rounding mode */
       int r;
  {
    octa o:
    if (e > {}^{\#}7fd) e = {}^{\#}7ff, o = zero\_octa;
```

```
else {
       if (e < 0) {
          if (e < -54) o.h = 0, o.l = 1;
          else { octa oo;
             o = shift\_right(f, -e, 1);
             oo = shift\_left(o, -e);
             if (oo.l \neq f.l \vee oo.h \neq f.h) o.l = 1; /* sticky bit */
          e = 0:
       } else o = f;
     ⟨Round and return the result 33⟩;
32. \langle \text{Global variables } 4 \rangle + \equiv
                     /* bits possibly destined for rA */
  int exceptions;
33.
       Everything falls together so nicely here, it's almost too good to be true!
\langle Round and return the result 33\rangle \equiv
  if (o.l \& 3) exceptions = X_BIT;
  switch (r) {
  case ROUND_DOWN: if (s \equiv ,-,) o = incr(o,3); break;
  case ROUND_UP: if (s \neq ,-,) o = incr(o,3);
  case ROUND_OFF: break;
  case ROUND_NEAR: o = incr(o, o.l \& 4?2:1); break;
  }
  o = shift\_right(o, 2, 1);
  o.h += e \ll 20;
  if (o.h > {}^{\#}7ff00000) exceptions |= 0_BIT + X_BIT; /* overflow */
  else if (o.h < ^{\#}100000) exceptions = U_BIT; /* tininess */
  if (s \equiv , -, ) o.h |= sign\_bit;
  return o:
This code is used in section 31.
```

shift\_right: octa (), §7.

 $sign\_bit = macro, \S 4.$ 

zero\_octa: octa, §4.

**34.** Similarly, *sfpack* packs a short float, from inputs having the same conventions as *fpack*.

```
\langle Subroutines 5\rangle + \equiv
  tetra sfpack ARGS((octa, int, char, int));
  tetra sfpack(f, e, s, r)
        octa f; /* the fraction part */
                 /* the raw exponent */
        int e:
                  /* the sign */
       char s;
                   /* the rounding mode */
       int r:
     register tetra o;
     if (e > {}^{\#}47d) e = {}^{\#}47f, o = 0;
     else {
       o = shift\_left(f, 3).h;
       if (f.l \& #1fffffff) o = 1;
        if (e < #380) {
          if (e < {}^{\#}380 - 25) o = 1:
          else { register tetra o\theta, oo;
             o\theta = o:
             o = o \gg (^{\#}380 - e);
             oo = o \ll (^{\#}380 - e);
             if (oo \neq o0) o = 1; /* sticky bit */
          e = {}^{\#}380:
     (Round and return the short result 35);
35. \langle Round and return the short result 35\rangle \equiv
  if (o \& 3) exceptions |= X_BIT;
  switch (r) {
  case ROUND_DOWN: if (s \equiv ,-,) o += 3; break;
  case ROUND_UP: if (s \neq `-`) o += 3;
  case ROUND_OFF: break;
  case ROUND_NEAR: o += (o \& 4 ? 2 : 1); break;
  }
  o = o \gg 2;
  o += (e - {}^{\#}380) \ll 23;
  if (o \ge {}^{\#}7f800000) exceptions |= 0\_BIT + X\_BIT; /* overflow */
  else if (o < \text{#100000}) exceptions |= U_BIT; /* tininess */
  if (s \equiv ,-,) o = sign\_bit;
  return o:
```

This code is used in section 34.

**36.** The funpack routine is, roughly speaking, the opposite of fpack. It takes a given floating point number x and separates out its fraction part f, exponent e, and sign s. It clears exceptions to zero. It returns the type of value found: zro, num, inf,

or nan. When it returns num, it will have set f, e, and s to the values from which fpack would produce the original number x without exceptions.

```
#define zero_exponent (-1000)
                                          /* zero is assumed to have this exponent */
\langle \text{ Other type definitions 36} \rangle \equiv
   typedef enum {
     zro, num, inf, nan
   } ftype;
See also section 59.
This code is used in section 1.
37. \langle Subroutines 5 \rangle + \equiv
   ftype funpack ARGS((octa, octa *, int *, char *));
   ftype funpack(x, f, e, s)
                     /* the given floating point value */
        octa x:
        octa *f:
                       /* address where the fraction part should be stored */
                      /* address where the exponent part should be stored */
                       /* address where the sign should be stored */
        char *s;
     register int ee;
      exceptions = 0:
     *s = (x.h \& sign\_bit ? '-' : '+');
     *f = shift\_left(x, 2);
     f \rightarrow h \&= #3fffff;
      ee = (x.h \gg 20) \& ^{\#}7ff;
     if (ee) {
        *e = ee - 1;
        f \rightarrow h \mid = 400000:
        return (ee < ^{\#}7ff ? num : f \rightarrow h \equiv ^{\#}400000 \land \neg f \rightarrow l ? inf : nan);
     if (\neg x.l \land \neg f \rightarrow h) {
        *e = zero\_exponent; return zro;
     do { ee - : *f = shift\_left(*f, 1); } while (¬(<math>f \rightarrow h \& #400000));
     *e = ee; return num;
   }
```

```
ARGS = macro(), \S 2.
                                    octa = struct, \S 3.
                                                                        shift_left: octa (), §7.
                                    ROUND_DOWN = 3, \S 30.
exceptions: int, §32.
                                                                        sign\_bit = macro, \S 4.
fpack: octa (), §31.
                                    ROUND_NEAR = 4, §30.
                                                                        tetra = unsigned int, §3.
h: tetra, §3.
                                    ROUND_OFF = 1, \S 30.
                                                                        U_BIT = macro, \S 31.
l: tetra, §3.
                                    ROUND_UP = 2, \S 30.
                                                                        X_BIT = macro, \S 31.
0_BIT = macro, \S 31.
```

```
38.
      \langle \text{Subroutines 5} \rangle + \equiv
  ftype sfunpack ARGS((tetra,octa *,int *,char *));
  ftvpe sfunpack(x, f, e, s)
                    /* the given floating point value */
        tetra x:
                     /* address where the fraction part should be stored */
        octa *f:
                    /* address where the exponent part should be stored */
       int *e:
       char *s:
                    /* address where the sign should be stored */
  {
     register int ee;
     exceptions = 0;
     *s = (x \& sign\_bit ? , -, : , +, );
     f - h = (x \gg 1) \& \#3fffff, f - l = x \ll 31;
     ee = (x \gg 23) \& #ff;
     if (ee) {
       *e = ee + *380 - 1;
        f \rightarrow h \mid = 400000:
       return (ee < ^{\#}ff ? num : (x \& ^{\#}7fffffff) \equiv ^{\#}7f800000 ? inf : nan);
     if (\neg(x \& \#7fffffff)) {
        *e = zero\_exponent; return zro;
     do { ee --; *f = shift\_left(*f, 1); } while (\neg (f \rightarrow h \& #400000));
     *e = ee + *380; return num;
  }
      Since MMIX downplays 32-bit operations, it uses sfpack and sfunpack only when
loading and storing short floats, or when converting from fixed point to floating point.
\langle \text{Subroutines 5} \rangle + \equiv
  octa load_sf ARGS((tetra));
  octa load\_sf(z)
                     /* 32 bits to be loaded into a 64-bit register */
        tetra z;
     octa f, x; int e; char s; ftype t;
     t = sfunpack(z, \&f, \&e, \&s);
     \mathbf{switch}(t) {
     case zro: x = zero\_octa; break;
     case num: return fpack(f, e, s, ROUND_OFF);
     case inf: x = inf\_octa; break;
     case nan: x = shift\_right(f, 2, 1); x.h |= {}^{\#}7ff00000; break;
     if (s \equiv ,-,) x.h = sign\_bit;
     return x:
  }
```

```
40. \langle Subroutines 5 \rangle + \equiv
  tetra store_sf ARGS((octa));
  tetra store\_sf(x)
                     /* 64 bits to be loaded into a 32-bit word */
        octa x:
     octa f; tetra z; int e; char s; ftype t;
     t = funpack(x, \&f, \&e, \&s);
     \mathbf{switch}(t) {
     case zro: z = 0; break;
     case num: return sfpack(f, e, s, cur\_round);
     case inf: z = {}^{\#}7f800000; break;
     case nan: if (\neg(f.h \& #200000)) {
           f.h \mid = #200000; exceptions \mid = I_BIT;
                                                         /* NaN was signaling */
        z = {}^{\#}7f800000 \mid (f.h \ll 1) \mid (f.l \gg 31); \text{ break};
     if (s \equiv '-') z = sign\_bit;
     return z:
```

```
ARGS = macro(), \S 2.
                                    inf = 2, \S 36.
                                                                        shift_left: octa (), §7.
                                                                        shift_right: octa (), §7.
cur_round: int, §30.
                                    inf_octa: octa, §4.
exceptions: int, §32.
                                   l: tetra, §3.
                                                                        sign\_bit = macro, \S 4.
                                   nan = 3, \S 36.
fpack: octa (), §31.
                                                                        tetra = unsigned int, §3.
ftype = enum, \S 36.
                                   num = 1, \S 36.
                                                                       zero\_exponent = macro, \S 36.
funpack: ftype (), §37.
                                   octa = struct, \S 3.
                                                                       zero_octa: octa, §4.
                                   ROUND_OFF = 1, \S 30.
h: tetra, §3.
                                                                       zro = 0, \S 36.
I_BIT = macro, \S 31.
                                   sfpack: tetra (), §34.
```

41. Floating multiplication and division. The hardest fixed point operations were multiplication and division; but these two operations are the *easiest* to implement in floating point arithmetic, once their fixed point counterparts are available.

```
\langle Subroutines 5\rangle + \equiv
  octa fmult ARGS((octa, octa));
  octa fmult(y, z)
       octa y, z;
     ftype yt, zt:
     int ye, ze;
     char ys, zs;
     octa x, xf, yf, zf;
     register int xe;
     register char xs;
     yt = funpack(y, \&yf, \&ye, \&ys);
     zt = funpack(z, \&zf, \&ze, \&zs);
     xs = ys + zs - '+'; /* will be '-' when the result is negative */
     switch (4*yt+zt) {
     (The usual NaN cases 42):
     case 4 * zro + zro: case 4 * zro + num: case 4 * num + zro: x = zero\_octa; break;
     case 4 * num + inf: case 4 * inf + num: case 4 * inf + inf: x = inf\_octa; break;
     case 4 * zro + inf: case 4 * inf + zro: x = standard\_NaN;
       exceptions = I_BIT; break;
     case 4 * num + num: \langle Multiply nonzero numbers and return 43 \rangle;
     if (xs \equiv '-') x.h = sign\_bit;
     return x;
  }
42. \langle The usual NaN cases 42 \rangle \equiv
case 4 * nan + nan: if (\neg (y.h \& #80000)) exceptions |= I_BIT;
                                                                       /* y is signaling */
case 4 * zro + nan: case 4 * num + nan: case 4 * inf + nan:
  if (\neg(z.h \& \#80000)) exceptions |= I_BIT, z.h| = \#80000;
case 4 * nan + zro: case 4 * nan + num: case 4 * nan + inf:
  if (\neg(y.h \& #80000)) exceptions |= I_BIT, y.h |= #80000;
  return y:
This code is used in sections 41, 44, 46, and 93.
     \langle Multiply nonzero numbers and return 43\rangle \equiv
  xe = ye + ze - {}^{\#}3fd;
                          /* the raw exponent */
  x = omult(yf, shift\_left(zf, 9));
  if (aux.h \ge #400000) xf = aux;
  else xf = shift\_left(aux, 1), xe --;
  if (x.h \lor x.l) xf. l = 1; /* adjust the sticky bit */
  return fpack(xf, xe, xs, cur\_round);
This code is used in section 41.
```

```
44. \langle Subroutines 5 \rangle + \equiv
  octa fdivide ARGS((octa, octa));
  octa fdivide(y, z)
       octa y, z;
     ftype yt, zt;
     int ye, ze;
     char ys, zs;
     octa x, xf, yf, zf;
     register int xe;
     register char xs;
     yt = funpack(y, \&yf, \&ye, \&ys);
     zt = funpack(z, \&zf, \&ze, \&zs);
     xs = ys + zs - '+'; /* will be '-' when the result is negative */
     switch (4*yt+zt) {
     (The usual NaN cases 42);
     case 4 * zro + inf: case 4 * zro + num: case 4 * num + inf: x = zero\_octa; break;
     case 4 * num + zro: exceptions = Z_BIT;
     case 4 * inf + num: case 4 * inf + zro: x = inf\_octa: break:
     case 4 * zro + zro: case 4 * inf + inf: x = standard\_NaN;
       exceptions |= I_BIT; break;
     case 4 * num + num: \(\rightarrow\) Divide nonzero numbers and return 45\);
     if (xs \equiv '-') x.h = sign\_bit;
     return x;
45. (Divide nonzero numbers and return 45) \equiv
  xe = ye - ze + #3fd; /* the raw exponent */
  xf = odiv(yf, zero\_octa, shift\_left(zf, 9));
  if (xf.h \ge \#800000) {
     aux.l = xf.l \& 1;
    xf = shift\_right(xf, 1, 1);
    xe ++;
  if (aux.h \lor aux.l) xf. l = 1; /* adjust the sticky bit */
  return fpack(xf, xe, xs, cur_round);
This code is used in section 44.
```

```
ARGS = macro (), \S 2.
                                     inf_octa: octa, §4.
                                                                         sign\_bit = macro, §4.
aux: \mathbf{octa}, \S 9.
                                    l: tetra, §3.
                                                                         standard_NaN: octa, §4.
                                    nan=3,\ \S 36.
cur_round: int, §30.
                                                                         y: octa, §46.
                                    num = 1, \S 36.
exceptions: int, §32.
                                                                         y: octa, §93.
fpack: octa (), §31.
                                    octa = struct, \S 3.
                                                                         z: octa, §46.
ftype = enum, \S 36.
                                    odiv: octa (), §13.
                                                                         z: octa, §93.
funpack: ftype (), §37.
                                    omult: octa (), §8.
                                                                        Z_BIT = macro, \S 31.
                                                                       zero_octa: octa, §4.
h: tetra, §3.
                                    shift\_left: \mathbf{octa} (), §7.
I_BIT = macro, \S 31.
                                    shift\_right: \mathbf{octa} (), §7.
                                                                       zro = 0, \S 36.
inf = 2, \S 36.
```

**46.** Floating addition and subtraction. Now for the bread-and-butter operation, the sum of two floating point numbers. It is not terribly difficult, but many cases need to be handled carefully.

```
\langle Subroutines 5\rangle + \equiv
  octa fplus ARGS((octa, octa));
  octa fplus(y, z)
       octa y, z;
     ftype yt, zt;
     int ye, ze;
     char ys, zs;
     octa x, xf, yf, zf;
     register int xe, d;
     register char xs:
     yt = funpack(y, \&yf, \&ye, \&ys);
     zt = funpack(z, \&zf, \&ze, \&zs);
     switch (4 * yt + zt) {
     (The usual NaN cases 42);
     case 4 * zro + num: return fpack(zf, ze, zs, ROUND_OFF); break;
          /* may underflow */
     case 4 * num + zro: return fpack(yf, ye, ys, ROUND_OFF); break;
          /* may underflow */
     case 4 * inf + inf: if (ys \neq zs) {
          exceptions |= I_BIT; x = standard_NaN; xs = zs; break;
     case 4 * num + inf: case 4 * zro + inf: x = inf octa; xs = zs; break;
     case 4 * inf + num: case 4 * inf + zro: x = inf\_octa; xs = ys; break;
     case 4 * num + num: if (y.h \neq (z.h \oplus *80000000) \lor y.l \neq z.l)
          ⟨Add nonzero numbers and return 47⟩;
     case 4 * zro + zro: x = zero\_octa;
        xs = (ys \equiv zs ? ys : cur\_round \equiv ROUND\_DOWN ? '-' : '+'); break;
     if (xs \equiv '-') x.h = sign\_bit;
     return x;
  }
47. \langle Add nonzero numbers and return \langle 47\rangle
  { octa o, oo;
     if (ye < ze \lor (ye \equiv ze \land (yf.h < zf.h \lor (yf.h \equiv zf.h \land yf.l < zf.l))))
        \langle \text{ Exchange } y \text{ with } z \text{ 48} \rangle;
     d = ye - ze;
     xs = ys, xe = ye;
     if (d) \langle Adjust for difference in exponents 49\rangle;
     if (ys \equiv zs) {
       xf = oplus(yf, zf);
       if (xf.h > \#800000) xe +++, d = xf.l \& 1, xf = shift_right(xf, 1, 1), xf.l = d;
     } else {
        xf = ominus(yf, zf);
       if (xf.h > \#800000) xe +++, d = xf.l \& 1, xf = shift_right(xf, 1, 1), xf.l = d;
```

```
else while (xf.h < \#400000) \ xe --, xf = shift\_left(xf, 1); } return fpack(xf, xe, xs, cur\_round); }

This code is used in section 46.

48. \langle \text{Exchange } y \text{ with } z \text{ } 48 \rangle \equiv {

o = yf, yf = zf, zf = o; d = ye, ye = ze, ze = d; d = ys, ys = zs, zs = d; }
```

This code is used in sections 47 and 51.

**49.** Proper rounding requires two bits to the right of the fraction delivered to fpack. The first is the true next bit of the result; the other is a "sticky" bit, which is nonzero if any further bits of the true result are nonzero. Sticky rounding to an integer takes x into the number  $\lfloor x/2 \rfloor + \lceil x/2 \rceil$ .

Some subtleties need to be observed here, in order to prevent the sticky bit from being shifted left. If we did not shift yf left 1 before shifting zf to the right, an incorrect answer would be obtained in certain cases—for example, if  $yf = 2^{54}$ ,  $zf = 2^{54} + 2^{53} - 1$ , d = 52.

```
 \left \langle \text{Adjust for difference in exponents } 49 \right \rangle \equiv \left \{ \begin{array}{ll} & \text{if } (d \leq 2) \ zf = shift\_right(zf,d,1); & /* \ \text{exact result } */ \\ & \text{else if } (d > 53) \ zf.h = 0, zf.l = 1; & /* \ \text{tricky but OK } */ \\ & \text{else } \left \{ \\ & \text{if } (ys \neq zs) \ d--, xe--, yf = shift\_left(yf,1); \\ & o = zf; \\ & zf = shift\_right(o,d,1); \\ & oo = shift\_left(zf,d); \\ & \text{if } (oo.l \neq o.l \lor oo.h \neq o.h) \ zf.l \mid = 1; \\ & \right \}
```

This code is used in section 47.

```
ARGS = macro (), §2.

cur_round: int, §30.

exceptions: int, §32.

fpack: octa (), §31.

ftype = enum, §36.

funpack: ftype (), §37.

h: tetra, §3.

I_BIT = macro, §31.
```

```
\begin{array}{l} inf = 2, \, \S 36. \\ inf\_octa\colon \mathbf{octa}, \, \S 4. \\ l\colon \mathbf{tetra}, \, \S 3. \\ num = 1, \, \S 36. \\ \mathbf{octa} = \mathbf{struct}, \, \S 3. \\ ominus\colon \mathbf{octa} \, (\,), \, \S 5. \\ oplus\colon \mathbf{octa} \, (\,), \, \S 5. \\ \mathsf{ROUND\_DOWN} = 3, \, \S 30. \end{array}
```

 $\label{eq:rough} \begin{array}{l} \text{ROUND\_OFF} = 1, \ \S 30. \\ shift\_left: \ \mathbf{octa} \ (), \ \S 7. \\ shift\_right: \ \mathbf{octa} \ (), \ \S 7. \\ sign\_bit = \text{macro}, \ \S 4. \\ standard\_NaN: \ \mathbf{octa}, \ \S 4. \\ zero\_octa: \ \mathbf{octa}, \ \S 4. \\ zro = 0, \ \S 36. \end{array}$ 

**50.** The comparison of floating point numbers with respect to  $\epsilon$  shares some of the characteristics of floating point addition/subtraction. In some ways it is simpler, and in other ways it is more difficult; we might as well deal with it now.

Subroutine fepscomp(y, z, e, s) returns 2 if y, z, or e is a NaN or e is negative. It returns 1 if s = 0 and  $y \approx z$  (e) or if  $s \neq 0$  and  $y \sim z$  (e), as defined in Section 4.2.2 of Seminumerical Algorithms; otherwise it returns 0.

```
\langle \text{Subroutines 5} \rangle + \equiv
  int fepscomp ARGS((octa, octa, octa, int));
  int fepscomp(y, z, e, s)
                          /* the operands */
        octa y, z, e;
        int s; /* test similarity? */
     octa yf, zf, ef, o, oo;
     int ye, ze, ee;
     char ys, zs, es;
     register int yt, zt, et, d;
     et = funpack(e, \&ef, \&ee, \&es);
     if (es \equiv '-') return 2;
     \mathbf{switch}\ (et)\ \{
     case nan: return 2;
     case inf: ee = 10000;
     case num: case zro: break;
     }
     yt = funpack(y, \&yf, \&ye, \&ys);
     zt = funpack(z, \&zf, \&ze, \&zs);
     switch (4*yt+zt) {
     case 4 * nan + nan: case 4 * nan + inf: case 4 * nan + num: case 4 * nan + zro:
        case 4*inf + nan: case 4*num + nan: case 4*zro + nan: return 2;
     case 4 * inf + inf: return (ys \equiv zs \lor ee \ge 1023);
     case 4 * inf + num: case 4 * inf + zro: case 4 * num + inf: case 4 * zro + inf:
        return (s \land ee > 1022);
     case 4 * zro + zro: return 1;
     case 4 * zro + num: case 4 * num + zro: if (\neg s) return 0;
     case 4 * num + num: break;
     (Compare two numbers with respect to epsilon and return 51);
  }
      The relation y \approx z (\epsilon) reduces to y \sim z (\epsilon/2^d), if d is the difference between the
larger and smaller exponents of y and z.
\langle Compare two numbers with respect to epsilon and return 51\rangle \equiv
  \langle \text{Unsubnormalize } y \text{ and } z, \text{ if they are subnormal } 52 \rangle;
  if (ye < ze \lor (ye \equiv ze \land (yf.h < zf.h \lor (yf.h \equiv zf.h \land yf.l < zf.l))))
     \langle \text{ Exchange } y \text{ with } z \text{ 48} \rangle;
  if (ze \equiv zero\_exponent) ze = ye;
  d = ue - ze:
  if (\neg s) ee -= d;
  if (ee > 1023) return 1; /* if \epsilon > 2, z \in N_{\epsilon}(y) */
```

```
 \begin{array}{l} \langle \mbox{ Compute the difference of fraction parts, } o \ 53 \, \rangle; \\ \mbox{if } (\neg o.h \wedge \neg o.l) \ \mbox{ return } 1; \\ \mbox{if } (ee < 968) \ \mbox{ return } 0; \quad /* \ \mbox{if } y \neq z \ \mbox{and } \epsilon < 2^{-54}, \ y \not\sim z \ */ \\ \mbox{if } (ee \geq 1021) \ \ ef = shift\_left(ef, ee - 1021); \\ \mbox{else } \ \ ef = shift\_right(ef, 1021 - ee, 1); \\ \mbox{return } \ \ o.h < ef.h \lor (o.h \equiv ef.h \wedge o.l \leq ef.l); \\ \end{array}
```

This code is used in section 50.

**52.**  $\langle$  Unsubnormalize y and z, if they are subnormal 52  $\rangle$   $\equiv$  if  $(ye < 0 \land yt \neq zro)$   $yf = shift\_left(y, 2), ye = 0;$  if  $(ze < 0 \land zt \neq zro)$   $zf = shift\_left(z, 2), ze = 0;$ 

This code is used in section 51.

**53.** At this point  $y \sim z$  if and only if

$$yf + (-1)^{[ys=zs]} zf/2^d \le 2^{ee-1021} ef = 2^{55} \epsilon.$$

We need to evaluate this relation without overstepping the bounds of our simulated 64-bit registers.

When d>2, the difference of fraction parts might not fit exactly in an octabyte; in that case the numbers are not similar unless  $\epsilon>3/8$ , and we replace the difference by the ceiling of the true result. When  $\epsilon<1/8$ , our program essentially replaces  $2^{55}\epsilon$  by  $\lfloor 2^{55}\epsilon \rfloor$ . These truncations are not needed simultaneously. Therefore the logic is justified by the facts that, if n is an integer, we have  $x\leq n$  if and only if  $\lceil x\rceil\leq n$ ;  $n\leq x$  if and only if  $n\leq \lfloor x\rfloor$ . (Notice that the concept of "sticky bit" is not appropriate here.)

```
 \begin{split} &\langle \, \text{Compute the difference of fraction parts, } o \, 53 \, \rangle \equiv \\ & \quad \text{if } (d > 54) \, o = zero\_octa, oo = zf \, ; \\ & \quad \text{else } o = shift\_right(zf,d,1), oo = shift\_left(o,d); \\ & \quad \text{if } (oo.h \neq zf.h \lor oo.l \neq zf.l) \, \left\{ \\ & \quad /* \, \text{truncated result, hence } d > 2 \, */ \\ & \quad \text{if } (ee < 1020) \, \, \text{return } \, 0; \\ & \quad /* \, \, \text{difference is too large for similarity } */ \\ & \quad o = incr(o,ys \equiv zs \, ? \, 0:1); \\ & \quad /* \, \, \text{adjust for ceiling } */ \\ & \quad > \\ & \quad o = (ys \equiv zs \, ? \, ominus(yf,o) : oplus(yf,o)); \end{split}
```

This code is used in section 51.

```
ARGS = macro(), \S 2.
                                    nan = 3, \S 36.
                                                                         shift_left: octa (), §7.
funpack: ftype (), §37.
                                    num = 1, \S 36.
                                                                         shift_right: octa (), §7.
h: tetra, §3.
                                    octa = struct, \S 3.
                                                                         zero\_exponent = macro, \S 36.
                                    ominus: octa (), §5.
incr: octa (), §6.
                                                                         zero_octa: octa, §4.
inf = 2, \S 36.
                                    oplus: octa (), §5.
                                                                         zro = 0, \S 36.
l: tetra, §3.
```

**54.** Floating point output conversion. The *print\_float* routine converts an octabyte to a floating decimal representation that will be input as precisely the same value.

```
 \begin{array}{l} \langle \text{Subroutines 5} \rangle + \equiv \\ \text{static void } \textit{bignum\_times\_ten ARGS}((\textbf{bignum *})); \\ \text{static void } \textit{bignum\_dec ARGS}((\textbf{bignum *}, \textbf{bignum *}, \textbf{tetra})); \\ \text{static int } \textit{bignum\_compare ARGS}((\textbf{bignum *}, \textbf{bignum *})); \\ \text{void } \textit{print\_float ARGS}((\textbf{octa})); \\ \text{void } \textit{print\_float}(x) \\ \text{octa } x; \\ \{ \\ \langle \text{Local variables for } \textit{print\_float 56} \rangle; \\ \text{if } (x.h \& \textit{sign\_bit}) \ \textit{printf("-")}; \\ \langle \text{Extract the exponent } e \text{ and determine the fraction interval } [f \ldots g] \text{ or } (f \ldots g) \text{ 55} \rangle; \\ \langle \text{Store } f \text{ and } g \text{ as multiprecise integers 63} \rangle; \\ \langle \text{Compute the significant digits } s \text{ and decimal exponent } e \text{ 64} \rangle; \\ \langle \text{Print the significant digits with proper context } 67 \rangle; \\ \} \end{array}
```

**55.** One way to visualize the problem being solved here is to consider the vastly simpler case in which there are only 2-bit exponents and 2-bit fractions. Then the sixteen possible 4-bit combinations have the following interpretations:

```
0000
           [0 \dots 0.125]
0001
           (0.125 \dots 0.375)
0010
           [0.375 \dots 0.625]
0011
           (0.625 \dots 0.875)
           [0.875 \dots 1.125]
0100
           (1.125 \dots 1.375)
0101
0110
           [1.375 \dots 1.625]
           (1.625 \dots 1.875)
0111
1000
           [1.875 \dots 2.25]
1001
           (2.25 \dots 2.75)
1010
           [2.75 \dots 3.25]
1011
           (3.25 \ldots 3.75)
           [3.75 \ldots \infty]
1100
1101
           NaN(0 ... 0.375)
1110
           NaN[0.375 ... 0.625]
1111
           NaN(0.625 ... 1)
```

Notice that the interval is closed, [f ... g], when the fraction part is even; it is open, (f ... g), when the fraction part is odd. The printed outputs for these sixteen values, if we actually were dealing with such short exponents and fractions, would be 0., .2, .5, .7, 1., 1.2, 1.5, 1.7, 2., 2.5, 3., 3.5, Inf, NaN.2, NaN, NaN.8, respectively.  $\langle \text{Extract the exponent } e \text{ and determine the fraction interval } [f ... g] \text{ or } (f ... g) \text{ 55} \rangle \equiv f = \text{shift\_left}(x, 1);$ 

```
e = f.h \gg 21;
```

```
f.h \&= #1fffff;
  if (\neg f.h \land \neg f.l) (Handle the special case when the fraction part is zero 57)
  else {
     q = incr(f, 1);
     f = incr(f, -1);
     if (\neg e) e=1;
                        /* subnormal */
     else if (e \equiv {}^{\#}7ff) {
        printf("NaN");
        if (g.h \equiv \text{#100000} \land g.l \equiv 1) return; /* the "standard" NaN */
        e = {}^{\#}3ff; /* extreme NaNs come out OK even without adjusting f or g */
     } else f.h = 200000, g.h = 200000;
This code is used in section 54.
56. \langle \text{Local variables for } print\_float | 56 \rangle \equiv
  octa f, g;
                  /* lower and upper bounds on the fraction part */
                      /* exponent part */
  register int e;
  register int j, k; /* all purpose indices */
See also section 66.
```

**57.** The transition points between exponents correspond to powers of 2. At such points the interval extends only half as far to the left of that power of 2 as it does to the right. For example, in the 4-bit minifloat numbers considered above, case 1000 corresponds to the interval  $[1.875 \dots 2.25]$ .

```
 \langle \text{ Handle the special case when the fraction part is zero } 57 \rangle \equiv \\ \{ & \text{ if } (\neg e) \; \{ & & \\ & printf("0."); \; \text{ return}; \\ \} & \text{ if } (e \equiv \#7\text{ff}) \; \{ & & \\ & printf("Inf"); \; \text{ return}; \\ \} & e--; \\ & f.h = \#3\text{fffff}, f.l = \#\text{ffffffff}; \\ & g.h = \#400000, g.l = 2; \\ \}
```

This code is used in section 55.

This code is used in section 54.

58. We want to find the "simplest" value in the interval corresponding to the given number, in the sense that it has fewest significant digits when expressed in decimal notation. Thus, for example, if the floating point number can be described by a relatively short string such as '.1' or '37e100', we want to discover that representation.

The basic idea is to generate the decimal representations of the two endpoints of the interval, outputting the leading digits where both endpoints agree, then making a final decision at the first place where they disagree.

The "simplest" value is not always unique. For example, in the case of 4-bit minifloat numbers we could represent the bit pattern 0001 as either .2 or .3, and we could represent 1001 in five equally short ways: 2.3 or 2.4 or 2.5 or 2.6 or 2.7. The algorithm below tries to choose the middle possibility in such cases.

[A solution to the analogous problem for fixed-point representations, without the additional complication of round-to-even, was used by the author in the program for T<sub>F</sub>X; see Beauty is Our Business (Springer, 1990), 233–242.]

Suppose we are given two fractions f and g, where  $0 \le f < g < 1$ , and we want to compute the shortest decimal in the closed interval  $[f \dots g]$ . If f = 0, we are done. Otherwise let 10f = d + f' and 10g = e + g', where  $0 \le f' < 1$  and  $0 \le g' < 1$ . If d < e, we can terminate by outputting any of the digits  $d + 1, \dots, e$ ; otherwise we output the common digit d = e, and repeat the process on the fractions  $0 \le f' < g' < 1$ . A similar procedure works with respect to the open interval  $(f \dots g)$ .

**59.** The program below carries out the stated algorithm by using multiprecision arithmetic on 77-place integers with 28 bits each. This choice facilitates multiplication by 10, and allows us to deal with the whole range of floating binary numbers using fixed point arithmetic. We keep track of the leading and trailing digit positions so that trivial operations on zeros are avoided.

If f points to a **bignum**, its radix- $2^{28}$  digits are  $f \neg dat[0]$  through  $f \neg dat[76]$ , from most significant to least significant. We assume that all digit positions are zero unless they lie in the subarray between indices  $f \neg a$  and  $f \neg b$ , inclusive. Furthermore, both  $f \neg dat[f \neg a]$  and  $f \neg dat[f \neg b]$  are nonzero, unless  $f \neg a = f \neg b = bignum\_prec - 1$ .

The **bignum** data type can be used with any radix less than  $2^{32}$ ; we will use it later with radix  $10^9$ . The *dat* array is made large enough to accommodate both applications.

```
#define bignum_prec 157 /* would be 77 if we cared only about print_float */ \langle Other type definitions 36\rangle +\equiv typedef struct {
    int a; /* index of the most significant digit */
    int b; /* index of the least significant digit; must be \geq a */
    tetra dat[bignum\_prec]; /* the digits; undefined except between a and b */
} bignum;
```

**60.** Here, for example, is how we go from f to 10f, assuming that overflow will not occur and that the radix is  $2^{28}$ :

```
\langle Subroutines 5\rangle + \equiv
   static void bignum_times_ten(f)
         bignum *f:
      register tetra *p, *q;
      register tetra x, carry;
      for (p = \& f \neg dat[f \neg b], q = \& f \neg dat[f \neg a], carry = 0; p > q; p --)  {
         x = *p * 10 + carry;
         *p = x \& #fffffff:
         carry = x \gg 28;
      }
      *p = carry;
      if (carry) f \rightarrow a --;
      if (f \rightarrow dat[f \rightarrow b] \equiv 0 \land f \rightarrow b > f \rightarrow a) f \rightarrow b - -;
   }
       And here is how we test whether f < g, f = g, or f > g, using any radix
whatever:
\langle Subroutines 5\rangle + \equiv
   static int bignum\_compare(f, g)
         bignum *f, *g;
      register tetra *p, *pp, *q, *qq;
      if (f \rightarrow a \neq g \rightarrow a) return f \rightarrow a > g \rightarrow a? -1:1;
      pp = \&f \rightarrow dat[f \rightarrow b], qq = \&g \rightarrow dat[g \rightarrow b];
      for (p = \&f \neg dat[f \neg a], q = \&g \neg dat[g \neg a]; p \le pp; p++, q++) {
         if (*p \neq *q) return *p < *q? -1:1;
         if (q \equiv qq) return p < pp;
      }
      return -1;
```

**62.** The following subroutine subtracts g from f, assuming that  $f \geq g > 0$  and using a given radix.

```
\langle Subroutines 5\rangle + \equiv
   static void bignum\_dec(f, q, r)
          bignum *f, *g;
          tetra r;
                          /* the radix */
      register tetra *p, *q, *qq;
      register int x, borrow;
      while (g \rightarrow b > f \rightarrow b) f \rightarrow dat[++f \rightarrow b] = 0;
       qq = \& g \rightarrow dat[q \rightarrow a];
      for (p = \&f \neg dat[g \neg b], q = \&g \neg dat[g \neg b], borrow = 0; q \ge qq; p - -, q - -) {
          x = *p - *q - borrow;
          if (x > 0) borrow = 0, *p = x;
          else borrow = 1, *p = x + r;
      for (; borrow; p--)
          if (*p) borrow = 0, *p = *p - 1;
          else *p = r - 1;
      while (f \rightarrow dat[f \rightarrow a] \equiv 0) {
          if (f \rightarrow a \equiv f \rightarrow b) { /* the result is zero */
             f \rightarrow a = f \rightarrow b = bignum\_prec - 1, f \rightarrow dat[bignum\_prec - 1] = 0;
             return:
          f \rightarrow a ++;
       while (f \rightarrow dat[f \rightarrow b] \equiv 0) f \rightarrow b --;
```

**63.** Armed with these subroutines, we are ready to solve the problem. The first task is to put the numbers into **bignum** form. If the exponent is e, the number destined for digit dat[k] will consist of the rightmost 28 bits of the given fraction after it has been shifted right c - e - 28k bits, for some constant c. We choose c so that, when e has its maximum value #7ff, the leading digit will go into position dat[1], and so that when the number to be printed is exactly 1 the integer part of g will also be exactly 1.

```
#define magic\_offset 2112 /* the constant c that makes it work */#define origin 37 /* the radix point follows dat[37] */ \langle Store f and g as multiprecise integers 63\rangle \equiv k = (magic\_offset - e)/28; ff.dat[k-1] = shift\_right(f, magic\_offset + 28 - e - 28 * k, 1).l \& #ffffffff; gg.dat[k-1] = shift\_right(g, magic\_offset + 28 - e - 28 * k, 1).l \& #ffffffff; ff.dat[k] = shift\_right(f, magic\_offset - e - 28 * k, 1).l \& #ffffffff; gg.dat[k] = shift\_right(g, magic\_offset - e - 28 * k, 1).l \& #ffffffff; ff.dat[k+1] = shift\_left(f, e + 28 * k - (magic\_offset - 28)).l \& #ffffffff; gg.dat[k+1] = shift\_left(g, e + 28 * k - (magic\_offset - 28)).l \& #fffffffff; ff.a = (ff.dat[k-1] ? k - 1 : k); ff.b = (ff.dat[k+1] ? k + 1 : k);
```

```
gg.a = (gg.dat[k-1]?k-1:k);

gg.b = (gg.dat[k+1]?k+1:k);
```

This code is used in section 54.

**64.** If e is sufficiently small, the fractions f and g will be less than 1, and we can use the stated algorithm directly. Of course, if e is extremely small, a lot of leading zeros need to be lopped off; in the worst case, we may have to multiply f and g by 10 more than 300 times. But hey, we don't need to do that extremely often, and computers are pretty fast nowadays.

In the small-exponent case, the computation always terminates before f becomes zero, because the interval endpoints are fractions with denominator  $2^t$  for some t > 50.

The invariant relations  $ff.dat[ff.a] \neq 0$  and  $gg.dat[gg.a] \neq 0$  are not maintained by the computation here, when ff.a = origin or gg.a = origin. But no harm is done, because  $bignum\_compare$  is not used.

```
 \begin{array}{l} \langle \text{Compute the significant digits } s \text{ and decimal exponent } e \text{ } 64 \rangle \equiv \\ & \text{if } (e > \text{\#401}) \; \langle \text{Compute the significant digits in the large-exponent case } 65 \rangle \\ & \text{else } \{ & /* \text{ if } e \leq \text{\#401 we have } gg.a \geq origin \text{ and } gg.dat[origin] \leq 8 \text{ } */ \\ & \text{if } (ff.a > origin) \; ff.dat[origin] = 0; \\ & \text{for } (e = 1, p = s; \; gg.a > origin \vee ff.dat[origin] \equiv gg.dat[origin]; \; ) \; \{ \\ & \text{if } (gg.a > origin) \; e - -; \\ & \text{else } *p + + = ff.dat[origin] + \text{`0'}, ff.dat[origin] = 0, gg.dat[origin] = 0; \\ & bignum\_times\_ten(\&ff); \\ & bignum\_times\_ten(\&ff); \\ & bignum\_times\_ten(\&gg); \\ & \} \\ & *p + + = ((ff.dat[origin] + 1 + gg.dat[origin]) \gg 1) + \text{`0'}; \; /* \text{ the middle digit } */ \\ & \} \\ & *p = \text{`$\ensuremath{\column{1}{\bullet}} (ff.dat[origin]) + f.d.dat[origin]) > 1) + \text{`0'}; \; /* \text{ the middle digit } */ \\ & \text{This code is used in section 54.} \end{array}
```

```
a: int, §59.
                                      (), \S 60.
                                                                        k: register int, §56.
b: int, §59.
                                    dat: tetra [], §59.
                                                                        l: tetra, §3.
\mathbf{bignum} = \mathbf{struct}, \S 59.
                                    e: register int, §56.
                                                                        p: register char *, §66.
bignum_compare: static int
                                    f: octa, §56.
                                                                        s: char [], §66.
  (), \S 61.
                                    ff: bignum, §66.
                                                                        shift_left: octa (), §7.
bignum\_prec = 157, §59.
                                    q: octa, §56.
                                                                        shift_right: octa (), §7.
bignum_times_ten: static void
                                   gg: bignum, §66.
                                                                        tetra = unsigned int, §3.
```

**65.** When e is large, we use the stated algorithm by considering f and g to be fractions whose denominator is a power of 10.

An interesting case arises when the number to be converted is #44ada56a4b0835bf, since the interval turns out to be

If this were a closed interval, we could simply give the answer 7e22; but the number 7e22 actually corresponds to #44ada56a4b0835c0 because of the round-to-even rule. Therefore the correct answer is, say, 6.9999999999999922. This example shows that we need a slightly different strategy in the case of open intervals; we cannot simply look at the first position in which the endpoints have different decimal digits. Therefore we change the invariant relation to  $0 \le f < g \le 1$ , when open intervals are involved, and we do not terminate the process when f = 0 or g = 1.

```
\langle Compute the significant digits in the large-exponent case 65\rangle \equiv
  { register int open = x.l \& 1;
     tt.dat[origin] = 10;
     tt.a = tt.b = origin;
     for (e = 1; bignum\_compare(\&gg, \&tt) \ge open; e++) bignum\_times\_ten(\&tt);
    p = s;
     while (1)
       bignum_times_ten(&ff);
       bianum\_times\_ten(\&aa):
       for (j = 0); bignum\_compare(&ff, &tt) > 0; j++)
          bignum_dec(&ff, &tt, #1000000), bignum_dec(&gg, &tt, #1000000);
       if (bignum\_compare(\&gg,\&tt) \ge open) break;
       *p++=j;
       if (ff.a \equiv bignum\_prec - 1 \land \neg open) goto done; /* f = 0 in a closed interval */
     for (k = j; bignum\_compare(\&qq, \&tt) > open; k++)
       bignum\_dec(\&gg,\&tt,\#10000000);
     *p++=(j+1+k)\gg 1; /* the middle digit */
  done:;
This code is used in section 64.
      The length of string s will be at most 17. For if f and q agree to 17 places, we
have g/f < 1 + 10^{-16}; but the ratio g/f is always \geq (1 + 2^{-52} + 2^{-53})/(1 + 2^{-52} - 1)
2^{-53}) > 1 + 2 × 10<sup>-16</sup>.
\langle \text{Local variables for } print\_float 56 \rangle + \equiv
  bignum ff, gg; /* fractions or numerators of fractions */
  bignum tt;
                   /* power of ten (used as the denominator) */
  char s[18];
```

register char \*p;

**67.** At this point the significant digits are in string s, and  $s[0] \neq 0$ . If we put a decimal point at the left of s, the result should be multiplied by  $10^e$ .

We prefer the output '300.' to the form '3e2', and we prefer '.03' to '3e-2'. In general, the output will use an explicit exponent only if the alternative would take more than 18 characters.

```
\langle Print the significant digits with proper context 67\rangle \equiv
  if (e > 17 \lor e < (int) \ strlen(s) - 17)
     printf("%c%s%se%d", s[0], (s[1]?".":""), s + 1, e - 1);
  else if (e < 0) printf(".%0*d%s", -e, 0, s);
  else if (strlen(s) \ge e) printf("%.*s.%s", e, s, s + e);
  else printf("%s%0*d.", s, e - (int) strlen(s), 0);
This code is used in section 54.
```

```
a: int, §59.
b: int, §59.
\mathbf{bignum} = \mathbf{struct}, \S 59.
bignum_compare: static int
  (), §61.
bignum_dec: static void (),
  §62.
```

```
bignum\_prec = 157, §59.
bignum_times_ten: static void
  (), §60.
dat: tetra [], §59.
e: register int, §56.
```

 $origin = 37, \S 63.$ print\_float: void (), §54. printf: int (), <stdio.h>. strlen: size\_t (), <string.h>. j: register int, §56. x: octa, §54. k: register int, §56.

l: tetra, §3.

**68.** Floating point input conversion. Going the other way, we want to be able to convert a given decimal number into its floating binary equivalent. The following syntax is supported:

For example, '-3.' is the floating constant #c00800000000000; '1e3' and '1000' are both equivalent to #408f400000000000; 'NaN' and '+NaN.5' are both equivalent to #7ff8000000000000.

The  $scan\_const$  routine looks at a given string and finds the longest initial substring that matches the syntax of either  $\langle$  decimal constant  $\rangle$  or  $\langle$  floating constant  $\rangle$ . It puts the corresponding value into the global octabyte variable val; it also puts the position of the first unscanned character in the global pointer variable  $next\_char$ . It returns 1 if a floating constant was found, 0 if a decimal constant was found, -1 if nothing was found. A decimal constant that doesn't fit in an octabyte is computed modulo  $2^{64}$ .

The value of *exceptions* set by *scan\_const* is not necessarily correct.

```
\langle \text{Subroutines 5} \rangle + \equiv
  static void bignum_double ARGS((bignum *));
  int scan_const ARGS((char *));
  int scan\_const(s)
        char *s:
     \langle Local \ variables \ for \ scan\_const \ 70 \rangle;
     val.h = val.l = 0;
     p = s;
     if (*p \equiv '+' \lor *p \equiv '-') siqn = *p++; else siqn = '+';
     if (strncmp(p, "NaN", 3) \equiv 0) NaN = true, p += 3;
     else NaN = false;
     if ((isdigit(*p) \land \neg NaN) \lor (*p \equiv '.' \land isdigit(*(p+1))))
        (Scan a number and return 73);
     if (NaN) (Return the standard NaN 71);
     if (strncmp(p, "Inf", 3) \equiv 0) \langle Return infinity 72 \rangle;
   no\_const\_found: next\_char = s; return -1;
   }
69.
     \langle \text{Global variables } 4 \rangle + \equiv
                   /* value returned by scan_const */
  octa val:
  char *next_char; /* pointer returned by scan_const */
```

```
70. \langle \text{Local variables for } scan\_const \ 70 \rangle \equiv
   \textbf{register char} \ *p, \ *q; \qquad /* \ \text{for string manipulations} \ */
   register bool NaN; /* are we processing a NaN? */
                   /* '+' or '-' */
   int sign;
See also sections 76 and 81.
This code is used in section 68.
71. \langle \text{Return the standard NaN 71} \rangle \equiv
      next\_char = p;
      val.h = {}^{\#}600000, exp = {}^{\#}3fe;
      goto packit;
This code is used in section 68.
72. \langle \text{Return infinity } 72 \rangle \equiv
      next\_char = p + 3;
      goto make_it_infinite;
This code is used in section 68.
```

ARGS = macro (), §2. bignum = struct, §59. bool: enum, §1. exceptions: int, §32. exp: register int, §76. false = 0, §1.
h: tetra, §3.
isdigit: int (), <ctype.h>.
l: tetra, §3.
make\_it\_infinite: label, §79.

 $\begin{aligned} &\textbf{octa} = \textbf{struct}, \, \S 3. \\ &packit \colon \text{label}, \, \S 78. \\ &strncmp \colon \textbf{int} \, (\,), \, \langle \textbf{string.h} \rangle. \\ &true = 1, \, \S 1. \end{aligned}$ 

We assume here that the user prefers a perfectly correct answer to a speedy almost-correct one, so we implement the most general case.

```
\langle Scan a number and return 73 \rangle \equiv
  {
     for (q = buf0, dec\_pt = (\mathbf{char} *) 0; isdiqit(*p); p++)  {
        val = oplus(val, shift\_left(val, 2)); /* multiply by 5 */
        val = incr(shift\_left(val, 1), *p - '0');
        if (q > buf0 \lor *p \neq `0`)
           if (q < buf_max) *q++ = *p;
           else if (*(q-1) \equiv '0') *(q-1) = *p;
     if (NaN) *q++ = '1';
     if (*p \equiv '.') \langle \text{Scan a fraction part } 74 \rangle;
     next\_char = p;
     exp = 0;
     if (*p \equiv 'e' \land \neg NaN) \land Scan an exponent 77);
     if (dec_pt) \langle Return a floating point constant 78\rangle;
     if (sign \equiv '-') val = ominus(zero\_octa, val);
     return 0:
   }
This code is used in section 68.
     \langle Scan a fraction part 74 \rangle \equiv
74.
     dec_pt = q;
     p++;
     for (zeros = 0; isdigit(*p); p++)
        if (*p \equiv '0' \land q \equiv buf0) zeros ++;
        else if (q < buf\_max) *q++ = *p;
        else if (*(q-1) \equiv '0') *(q-1) = *p;
   }
```

This code is used in section 73.

**75.** The buffer needs room for eight digits of padding at the left, followed by up to 1022 + 53 - 307 significant digits, followed by a "sticky" digit at position  $buf\_max - 1$ , and eight more digits of padding.

```
#define buf0 (buf + 8)

#define buf\_max (buf + 777)

\langle Global \ variables \ 4 \rangle +\equiv

static char buf[785] = "00000000"; /* where we put significant input digits */
```

```
76. ⟨Local variables for scan_const 70⟩ +≡

register char *dec_pt; /* position of decimal point in buf */

register int exp; /* scanned exponent; later used for raw binary exponent */

register int zeros; /* leading zeros removed after decimal point */
```

77. Here we don't advance *next\_char* and force a decimal point until we know that a syntactically correct exponent exists.

```
\langle Scan \ an \ exponent \ 77 \rangle \equiv
  { register char exp_sign;
     n++:
     if (*p \equiv '+' \lor *p \equiv '-') exp\_siqn = *p++; else exp\_siqn = '+';
     if (isdigit(*p)) {
        for (exp = *p++ - '0'; isdigit(*p); p++)
           if (exp < 100000000) exp = 10 * exp + *p - '0';
        if (\neg dec\_pt) dec\_pt = q, zeros = 0;
        if (exp\_sign \equiv ,-,) exp = -exp;
        next\_char = p;
This code is used in section 73.
78.
       \langle \text{Return a floating point constant 78} \rangle \equiv
     \langle Move the digits from buf to ff 79\rangle;
     (Determine the binary fraction and binary exponent 83);
  packit: ( Pack and round the answer 84);
     return 1;
This code is used in section 73.
```

79. Now we get ready to compute the binary fraction bits, by putting the scanned input digits into a multiprecision fixed-point accumulator ff that spans the full necessary range. After this step, the number that we want to convert to floating binary will appear in ff.dat[ff.a], ff.dat[ff.a+1], ..., ff.dat[ff.b]. The radix-10<sup>9</sup> digit in ff[36-k] is understood to be multiplied by  $10^{9k}$ , for  $36 \ge k \ge -120$ .

```
\langle Move the digits from buf to ff 79\rangle \equiv
  x = buf + 341 + zeros - dec_pt - exp;
  if (q \equiv buf0 \lor x \ge 1413) {
  make\_it\_zero: exp = -99999; goto packit;
  if (x < 0) {
  make\_it\_infinite: exp = 99999; goto packit;
  }
  ff.a = x/9;
  for (p = q; p < q + 8; p ++) *p = '0'; /* pad with trailing zeros */
  q = q - 1 - (q + 341 + zeros - dec_pt - exp) \% 9; /* compute stopping place in buf */
  for (p = buf0 - x \% 9, k = ff.a; p < q \land k < 156; p += 9, k++)
     \langle \text{ Put the 9-digit number } *p \dots *(p+8) \text{ into } \textit{ff.dat}[k] *80 \rangle;
  ff.b = k - 1;
  for (x = 0; p \le q; p += 9)
     if (strncmp(p, "000000000", 9) \neq 0) \ x = 1;
  ff.dat[156] += x; /* nonzero digits that fall off the right are sticky */
  while (ff.dat[ff.b] \equiv 0) ff.b--;
This code is used in section 78.
      \langle \text{ Put the 9-digit number } *p \dots *(p+8) \text{ into } \textit{ff.dat}[k] *80 \rangle \equiv
     for (x = *p - '0', pp = p + 1; pp 
     ff.dat[k] = x;
This code is used in section 79.
81. \langle \text{Local variables for } scan\_const 70 \rangle + \equiv
  register int k, x;
  register char *pp;
  bignum ff, tt;
82. Here's a subroutine that is dual to bignum_times_ten. It changes f to 2f,
assuming that overflow will not occur and that the radix is 10^9.
\langle \text{Subroutines 5} \rangle + \equiv
  static void bignum\_double(f)
        bignum *f;
     register tetra *p, *q;
     register int x, carry;
     for (p = \& f \rightarrow dat[f \rightarrow b], q = \& f \rightarrow dat[f \rightarrow a], carry = 0; p > q; p - -) {
        x = *p + *p + carry:
        if (x \ge 1000000000) carry = 1, *p = x - 10000000000;
        else carry = 0, *p = x;
```

```
*p = carry;
     if (carry) f \rightarrow a --;
     if (f \rightarrow dat[f \rightarrow b] \equiv 0 \land f \rightarrow b > f \rightarrow a) f \rightarrow b --;
83. (Determine the binary fraction and binary exponent 83) \equiv
  val = zero\_octa;
  if (ff.a > 36) {
     for (exp = {}^{\#}3fe; ff.a > 36; exp --) bignum_double(&ff);
     for (k = 54; k; k--) {
        if (ff.dat[36]) {
           if (k \ge 32) val.h |= 1 \ll (k - 32); else val.l |= 1 \ll k;
           ff.dat[36] = 0;
           if (ff.b \equiv 36) break;
                                     /* break if ff now zero */
        bignum\_double(\&ff);
     }
  } else {
     tt.a = tt.b = 36, tt.dat[36] = 2;
     for (exp = {}^{\#}3fe; bignum\_compare(\&ff, \&tt) \ge 0; exp ++) bignum\_double(\&tt);
     for (k = 54; k; k--) {
        bignum_double(&ff);
        if (bignum\_compare(\&ff, \&tt) \ge 0) {
           if (k \ge 32) val. h = 1 \ll (k - 32); else val. l = 1 \ll k;
           bignum\_dec(\&ff, \&tt, 1000000000);
           if (ff.a \equiv bignum\_prec - 1) break;
                                                       /* break if ff now zero */
     }
  if (k \equiv 0) val. l = 1; /* add sticky bit if ff nonzero */
This code is used in section 78.
```

```
a: int, §59.
                                      (), \S 60.
                                                                        packit: label, §78.
                                    buf: static char [], §75.
b: int, §59.
                                                                        q: register char *, §70.
\mathbf{bignum} = \mathbf{struct}, \S 59.
                                    buf0 = macro, \S 75.
                                                                        scan\_const: int (), §68.
bignum_compare: static int
                                    dat: tetra [], §59.
                                                                        strncmp: int (), <string.h>.
  (), \S 61.
                                    dec_pt: register char *, §76.
                                                                        tetra = unsigned int, §3.
bignum_dec: static void (),
                                    exp: register int, §76.
                                                                        val: octa, §69.
  §62.
                                    h: tetra, §3.
                                                                        zero_octa: octa, §4.
bignum\_prec = 157, §59.
                                   l: tetra, §3.
                                                                       zeros: register int, §76.
bignum_times_ten: static void
                                   p: register char *, §70.
```

Although the input 'NaN.0' is illegal, strictly speaking, we silently convert it to #7ff000000000001—a number that would be output as 'NaN.00000000000002'.

This code is used in section 78.

**85.** Floating point remainders. In this section we implement the remainder of the floating point operations—one of which happens to be the operation of taking the remainder.

The easiest task remaining is to compare two floating point quantities. Routine fcomp returns -1 if y < z, 0 if y = z, +1 if y > z, and +2 if y and z are unordered.

```
\langle Subroutines 5\rangle + \equiv
  int fcomp ARGS((octa, octa));
  int fcomp(y, z)
       octa y, z:
    ftype yt, zt;
    int ye, ze;
    char ys, zs;
    octa yf, zf;
    register int x;
    yt = funpack(y, \&yf, \&ye, \&ys);
    zt = funpack(z, \&zf, \&ze, \&zs);
    switch (4*yt+zt) {
    case 4 * nan + nan: case 4 * zro + nan: case 4 * num + nan: case 4 * inf + nan:
       case 4 * nan + zro: case 4 * nan + num: case 4 * nan + inf: return 2;
    case 4 * zro + zro: return 0;
    case 4 * zro + num: case 4 * num + zro: case 4 * zro + inf: case 4 * inf + zro:
       case 4 * num + num: case 4 * num + inf: case 4 * inf + num: case 4 * inf + inf:
       if (ys \neq zs) x = 1;
       else if (y.h > z.h) x = 1;
       else if (y.h < z.h) \ x = -1;
       else if (y.l > z.l) x = 1;
       else if (y.l < z.l) \ x = -1;
       else return 0;
       break:
     }
    return (ys \equiv , -, ? -x : x);
```

```
ARGS = macro(), \S 2.
                                   inf = 2, \S 36.
                                                                       octa = struct, \S 3.
exp: register int, §76.
                                   l: tetra, §3.
                                                                      ROUND NEAR = 4.830.
                                   NaN: register bool, §70.
fpack: octa (), §31.
                                                                       sign: int, §70.
ftype = enum, §36.
                                   nan = 3, \S 36.
                                                                       val: octa, §69.
funpack: ftype (), §37.
                                   num = 1, \S 36.
                                                                      zro = 0, \S 36.
h: tetra, §3.
```

**86.** Several MMIX operations act on a single floating point number and accept an arbitrary rounding mode. For example, consider the operation of rounding to the nearest floating point integer:

```
\langle Subroutines 5\rangle + \equiv
  octa fintegerize ARGS((octa, int));
  octa fintegerize(z,r)
       octa z; /* the operand */
                  /* the rounding mode */
       int r;
     ftype zt;
    int ze;
    char zs;
    octa xf, zf;
     zt = funpack(z, \&zf, \&ze, \&zs);
    if (\neg r) r = cur\_round;
    \mathbf{switch} (zt)  {
    case nan: if (\neg(z.h \& \#80000)) { exceptions |= I_BIT; z.h |= \#80000; }
    case inf: case zro: return z;
     case num: (Integerize and return 87);
  }
87. (Integerize and return 87) \equiv
  if (ze \ge 1074) return fpack(zf, ze, zs, ROUND_OFF); /* already an integer */
  if (ze < 1020) xf.h = 0, xf.l = 1;
  else { octa oo;
    xf = shift\_right(zf, 1074 - ze, 1);
     oo = shift\_left(xf, 1074 - ze);
     if (oo.l \neq zf.l \vee oo.h \neq zf.h) xf.l = 1; /* sticky bit */
  }
  switch (r) {
  case ROUND_DOWN: if (zs \equiv '-') xf = incr(xf, 3); break;
  case ROUND_UP: if (zs \neq '-') xf = incr(xf, 3);
  case ROUND_OFF: break;
  case ROUND_NEAR: xf = incr(xf, xf.l \& 4?2:1); break;
  }
  xf.l \&= #fffffffc;
  if (ze \ge 1022) return fpack(shift\_left(xf, 1074 - ze), ze, zs, ROUND\_OFF);
  if (xf.l) xf.h = \text{#3ff00000}, xf.l = 0;
  if (zs \equiv '-') xf.h = sign\_bit;
  return xf;
```

This code is used in section 86.

```
88. To convert floating point to fixed point, we use fixit.
```

```
\langle Subroutines 5\rangle + \equiv
  octa fixit ARGS((octa, int));
  octa fixit(z,r)
                    /* the operand */
        octa z:
        int r;
                   /* the rounding mode */
     ftype zt:
     int ze:
     char zs;
     octa zf, o;
     zt = funpack(z, \&zf, \&ze, \&zs);
     if (\neg r) r = cur\_round;
     switch (zt) {
     case nan: case inf: exceptions |= I_BIT; return z;
     case zro: return zero_octa;
     case num: if (funpack(fintegerize(z, r), \&zf, \&ze, \&zs) \equiv zro) return zero\_octa;
        if (ze \le 1076) o = shift\_right(zf, 1076 - ze, 1);
        else {
          if (ze > 1085 \lor (ze \equiv 1085 \land (zf.h > \#400000 \lor 
                   (zf.h \equiv \text{\#400000} \land (zf.l \lor zs \neq \text{'-'}))))) exceptions |= W_BIT;
          if (ze > 1140) return zero_octa;
          o = shift\_left(zf, ze - 1076);
        return (zs \equiv '-'? ominus(zero\_octa, o) : o);
     }
  }
```

```
ARGS = macro(), \S 2.
                                    inf = 2, \S 36.
                                                                        ROUND_OFF = 1, \S 30.
                                                                         ROUND_UP = 2, \S 30.
cur_round: int, §30.
                                    l: tetra, §3.
                                                                         shift_left: octa (), §7.
exceptions: int, §32.
                                    nan = 3, \S 36.
fpack: octa (), §31.
                                    num = 1, \S 36.
                                                                        shift_right: octa (), §7.
ftype = enum, \S 36.
                                    octa = struct, \S 3.
                                                                        sign\_bit = macro, \S 4.
funpack: ftype (), §37.
                                    ominus: octa (), §5.
                                                                        W_BIT = macro, \S 31.
h: tetra, §3.
                                    ROUND_DOWN = 3, \S 30.
                                                                        zero_octa: octa, §4.
I_BIT = macro, \S 31.
                                    ROUND_NEAR = 4, \S 30.
                                                                        zro = 0, \S 36.
incr: octa (), §6.
```

89. Going the other way, we can specify not only a rounding mode but whether the given fixed point octabyte is signed or unsigned, and whether the result should be rounded to short precision.

```
\langle Subroutines 5\rangle + \equiv
  octa floatit ARGS((octa, int, int, int));
  octa floatit(z, r, u, p)
                     /* octabyte to float */
        int r:
                    /* rounding mode */
                  /* unsigned? */
        int u:
        int p;
                   /* short precision? */
     int e; char s;
     register int t;
     exceptions = 0;
     if (\neg z.h \land \neg z.l) return zero_octa;
     if (\neg r) r = cur\_round;
     if (\neg u \land (z.h \& sign\_bit)) s = '-', z = ominus(zero\_octa, z); else s = '+';
     e = 1076;
     while (z.h < 400000) e^{-}, z = shift_left(z, 1);
     while (z.h \ge \#800000) {
        e++;
        t = z.l \& 1;
        z = shift\_right(z, 1, 1);
        z.l \mid = t;
     if (p) \langle Convert to short float 90\rangle;
     return fpack(z, e, s, r);
90. \langle Convert to short float 90\rangle \equiv
     register int ex; register tetra t;
     t = sfpack(z, e, s, r);
     ex = exceptions;
     sfunpack(t, \&z, \&e, \&s);
     exceptions = ex;
  }
This code is used in section 89.
       The square root operation is more interesting.
91.
\langle \text{Subroutines 5} \rangle + \equiv
  octa froot ARGS((octa, int));
  octa froot(z,r)
       octa z:
                     /* the operand */
                   /* the rounding mode */
     ftype zt;
     int ze;
```

```
char zs;
octa x, xf, rf, zf;
register int xe, k;
if (\neg r) r = cur\_round;
zt = funpack(z, \&zf, \&ze, \&zs);
if (zs \equiv `-` \land zt \neq zro) exceptions |= I\_BIT, x = standard\_NaN;
else switch (zt) {
    case nan: if (\neg(z.h \& \#80000)) exceptions |= I\_BIT, z.h| = \#80000;
    return z;
    case inf: case
```

**92.** The square root can be found by an adaptation of the old pencil-and-paper method. If  $n = \lfloor \sqrt{s} \rfloor$ , where s is an integer, we have  $s = n^2 + r$  where  $0 \le r \le 2n$ ; this invariant can be maintained if we replace s by 4s + (0, 1, 2, 3) and n by 2n + (0, 1). The following code implements this idea with 2n in xf and r in rf. (It could easily be made to run about twice as fast.)

```
 \begin{array}{l} \langle \, {\rm Take \; the \; square \; root \; and \; \bf return \; } \, 92 \, \rangle \equiv \\ xf . h = 0, xf . l = 2; \\ xe = (ze + {\#3fe}) \gg 1; \\ {\rm if \; } (ze \, \& \, 1) \; \; zf = shift\_left(zf, 1); \\ rf . h = 0, rf . l = (zf . h \gg 22) - 1; \\ {\rm for \; } (k = 53; \; k; \; k - -) \; \{ \\ rf = shift\_left(rf, 2); \; xf = shift\_left(xf, 1); \\ {\rm if \; } (k \geq 43) \; \; rf = incr(rf, (zf . h \gg (2*(k - 43))) \, \& \, 3); \\ {\rm else \; if \; } (k \geq 27) \; \; rf = incr(rf, (zf . l \gg (2*(k - 27))) \, \& \, 3); \\ {\rm if \; } ((rf . l > xf . l \wedge rf . h \geq xf . h) \vee rf . h > xf . h) \; \{ \\ xf . l + +; \; rf = ominus(rf, xf); \; xf . l + +; \\ \} \\ {\rm if \; } (rf . h \vee rf . l) \; xf . l + +; \; /* \; {\rm sticky \; bit \; */ } \\ {\rm return \; } fpack(xf, xe, '+', r); \end{array}
```

This code is used in section 91.

```
sfunpack: ftype (), §38.
ARGS = macro(), \S 2.
                                   incr: octa (), §6.
                                                                       shift_left: octa (), §7.
cur_round: int, §30.
                                   inf = 2, \S 36.
exceptions: int, §32.
                                   l: tetra, §3.
                                                                       shift_right: octa (), §7.
fpack: octa (), §31.
                                   nan = 3, \S 36.
                                                                      sign\_bit = macro, \S 4.
ftype = enum, \S 36.
                                   num = 1, \S 36.
                                                                      standard_NaN: octa, §4.
funpack: ftype (), §37.
                                   octa = struct, \S 3.
                                                                      tetra = unsigned int, §3.
h: tetra, §3.
                                   ominus: octa (), §5.
                                                                      zero_octa: octa, §4.
I_BIT = macro, \S 31.
                                   sfpack: tetra (), §34.
                                                                      zro = 0, \S 36.
```

93. And finally, the genuine floating point remainder. Subroutine fremstep either calculates  $y \operatorname{rem} z$  or reduces y to a smaller number having the same remainder with respect to z. In the latter case the E\_BIT is set in exceptions. A third parameter, delta, gives a decrease in exponent that is acceptable for incomplete results; if delta is sufficiently large, say 2500, the correct result will always be obtained in one step of fremstep.

```
\langle Subroutines 5\rangle + \equiv
  octa fremstep ARGS((octa, octa, int));
  octa fremstep(y, z, delta)
       octa y, z;
       int delta:
    ftype yt, zt;
    int ye, ze;
    char xs, ys, zs;
    octa x, xf, yf, zf;
    register int xe, thresh, odd;
     yt = funpack(y, \&yf, \&ye, \&ys);
     zt = funpack(z, \&zf, \&ze, \&zs);
    switch (4*yt+zt) {
     (The usual NaN cases 42);
    case 4 * zro + zro: case 4 * num + zro: case 4 * inf + zro: case 4 * inf + num:
       case 4 * inf + inf: x = standard\_NaN;
       exceptions = I_BIT; break;
    case 4 * zro + num: case 4 * zro + inf: case 4 * num + inf: return y;
    case 4 * num + num: (Remainderize nonzero numbers and return 94);
     zero\_out: x = zero\_octa;
    if (ys \equiv ,-,) x.h = sign\_bit;
    return x:
  }
```

**94.** If there's a huge difference in exponents and the remainder is nonzero, this computation will take a long time. One could compute  $(2^n y)$  rem z much more quickly for large n by using  $O(\log n)$  multiplications modulo z, but the floating remainder operation isn't important enough to justify such expensive hardware.

Results of floating remainder are always exact, so the rounding mode is immaterial.

```
⟨ Remainderize nonzero numbers and return 94⟩ ≡ odd = 0; /* will be 1 if we've subtracted an odd multiple of z from y */ thresh = ye - delta; if (thresh < ze) thresh = ze; while (ye \ge thresh) ⟨ Reduce (ye, yf) by a multiple of zf; goto zero\_out if the remainder is zero, goto try\_complement if appropriate 95⟩; if (ye \ge ze) { exceptions \models \texttt{E\_BIT}; return fpack(yf, ye, ys, \texttt{ROUND\_OFF}); } if (ye < ze - 1) return fpack(yf, ye, ys, \texttt{ROUND\_OFF}); yf = shift\_right(yf, 1, 1);
```

```
 \begin{array}{l} try\_complement\colon xf=ominus(zf,yf), xe=ze, xs='+'+'-'-ys;\\ \textbf{if}\ (xf.h>yf.h\lor(xf.h\equiv yf.h\land(xf.l>yf.l\lor(xf.l\equiv yf.l\land\neg odd))))}\ xf=yf, xs=ys;\\ \textbf{while}\ (xf.h<^\#400000)\ xe--, xf=shift\_left(xf,1);\\ \textbf{return}\ fpack(xf,xe,xs,\texttt{ROUND\_OFF}); \end{array}
```

This code is used in section 93.

**95.** Here we are careful not to change the sign of y, because a remainder of 0 is supposed to inherit the original sign of y.

```
 \langle \operatorname{Reduce} \ (ye,yf) \ \operatorname{by} \ \operatorname{a} \ \operatorname{multiple} \ \operatorname{of} \ zf; \ \operatorname{\textbf{goto}} \ \operatorname{zero\_out} \ \operatorname{if} \ \operatorname{the} \ \operatorname{remainder} \ \operatorname{is} \ \operatorname{zero}, \ \operatorname{\textbf{goto}} \ \operatorname{try\_complement} \ \operatorname{if} \ \operatorname{appropriate} \ 95 \ \rangle \equiv \\ \left\{ \begin{array}{l} \text{ if } \ (yf.h \equiv zf.h \land yf.l \equiv zf.l) \ \operatorname{\textbf{goto}} \ \operatorname{zero\_out}; \\ \text{ if } \ (yf.h < zf.h \lor (yf.h \equiv zf.h \land yf.l < zf.l)) \ \left\{ \\ \text{ if } \ (ye \equiv ze) \ \operatorname{\textbf{goto}} \ \operatorname{try\_complement}; \\ ye --, yf = \operatorname{shift\_left}(yf,1); \\ \\ \ yf = \operatorname{ominus}(yf,zf); \\ \text{ if } \ (ye \equiv ze) \ \operatorname{odd} = 1; \\ \text{ while } \ (yf.h < \#400000) \ ye --, yf = \operatorname{shift\_left}(yf,1); \\ \\ \ \rbrace
```

This code is used in section 94.

```
ARGS = macro (), §2.

E_BIT = macro, §31.

exceptions: int, §32.

fpack: octa (), §31.

ftype = enum, §36.

funpack: ftype (), §37.

h: tetra, §3.
```

```
I_BIT = macro, §31. inf = 2, §36. l: tetra, §3. num = 1, §36. octa = struct, §3. ominus: octa (), §5. ROUND_OFF = 1, §30.
```

 $\begin{array}{l} \textit{shift\_left: octa} \ (), \ \S 7. \\ \textit{shift\_right: octa} \ (), \ \S 7. \\ \textit{sign\_bit} = \max (), \ \S 4. \\ \textit{standard\_NaN: octa}, \ \S 4. \\ \textit{zero\_octa: octa}, \ \S 4. \\ \textit{zro} = 0, \ \S 36. \end{array}$ 

## 96. Names of the sections.

```
(Add nonzero numbers and return 47) Used in section 46.
(Adjust for difference in exponents 49) Used in section 47.
Check that x < z; otherwise give trivial answer 14 \rangle Used in section 13.
 Compare two numbers with respect to epsilon and return 51 Used in section 50.
 Compute the difference of fraction parts, o 53 \ Used in section 51.
 Compute the significant digits in the large-exponent case 65 \ Used in section 64.
 Compute the significant digits s and decimal exponent e 64 \quad Used in section 54.
 Convert to short float 90 \ Used in section 89.
 Determine the binary fraction and binary exponent 83 \ Used in section 78.
 Determine the number of significant places n in the divisor v_{16} Used in section 13.
 Determine the quotient digit q[j] 20 \ Used in section 13.
Divide nonzero numbers and return 45 \ Used in section 44.
 Exchange y with z 48 \rightarrow Used in sections 47 and 51.
\langle Extract the exponent e and determine the fraction interval [f ... g] or (f ... g) 55\rangle
  Used in section 54.
\langle Find the trial quotient, \hat{q} 21 \rangle Used in section 20.
 Global variables 4, 9, 30, 32, 69, 75 \ Used in section 1.
Handle the special case when the fraction part is zero 57 \ Used in section 55.
If the result was negative, decrease \hat{q} by 1 23 \ Used in section 20.
(Integerize and return 87) Used in section 86.
\langle \text{Local variables for } print\_float 56, 66 \rangle Used in section 54.
\langle \text{Local variables for } scan\_const 70, 76, 81 \rangle Used in section 68.
\langle Move the digits from buf to ff 79\rangle Used in section 78.
 Multiply nonzero numbers and return 43 \ Used in section 41.
Normalize the divisor 17 Used in section 13.
 Other type definitions 36, 59 \ Used in section 1.
 Pack and round the answer 84 \ Used in section 78.
Pack q and u to acc and aux 19 \text{ Used in section 13.}
\langle \text{ Pack } w \text{ into the outputs } aux \text{ and } acc \text{ 11} \rangle Used in section 8.
(Print the significant digits with proper context 67) Used in section 54.
 Put the 9-digit number *p \dots *(p+8) into ff.dat[k] 80 \ Used in section 79.
\langle \text{Reduce } (ye, yf) \text{ by a multiple of } zf; \mathbf{goto} \ zero\_out \text{ if the remainder is zero, } \mathbf{goto} \rangle
  try_complement if appropriate 95 \ Used in section 94.
Remainderize nonzero numbers and return 94 Used in section 93.
(Return a floating point constant 78) Used in section 73.
(Return infinity 72) Used in section 68.
Return the standard NaN 71 \ Used in section 68.
Round and return the result 33 \ Used in section 31.
Round and return the short result 35 \ Used in section 34.
Scan a fraction part 74 \ Used in section 73.
(Scan a number and return 73) Used in section 68.
Scan an exponent 77 \ Used in section 73.
\langle Store f and g as multiprecise integers 63 \rangle Used in section 54.
(Stuff for C preprocessor 2) Used in section 1.
```

```
\langle Subroutines 5, 6, 7, 8, 12, 13, 24, 25, 26, 27, 28, 29, 31, 34, 37, 38, 39, 40, 41, 44, 46, 50, 54, 60, 61, 62, 68, 82, 85, 86, 88, 89, 91, 93\rangle Used in section 1. \langle Subtract b^{\hat{j}}\hat{q}v from u 22\rangle Used in section 20. \langle Take the square root and return 92\rangle Used in section 91. \langle Tetrabyte and octabyte type definitions 3\rangle Used in section 1. \langle The usual NaN cases 42\rangle Used in sections 41, 44, 46, and 93. \langle Unnormalize the remainder 18\rangle Used in section 13. \langle Unpack the dividend and divisor to u and v 15\rangle Used in section 13. \langle Unpack the multiplier and multiplicand to u and v 10\rangle Used in section 8. \langle Unsubnormalize v and v 15\rangle Used in section 51.
```

1. Input format. Configuration files allow this simulator to adapt itself to infinitely many possible combinations of hardware features. The purpose of the present module is to read a configuration file, check it for validity, and set up the relevant data structures.

All data in a configuration file consists simply of *tokens* separated by one or more units of white space, where a "token" is any sequence of nonspace characters that doesn't contain a percent sign. Percent signs and anything following them on a line are ignored; this convention allows a user to include comments in the file. Here's a simple (but weird) example:

It means that (1) the write buffer has capacity for 200 octabytes; (2) the memory bus takes 100 cycles to process an address; (3) there's a D-cache, in which each set has 4 blocks and the replacement policy is least-recently-used; (4) each block in the D-cache has 1024 bytes; (5) there are two functional units, one for all the odd-numbered opcodes and one for all the rest; (6) the division instructions take three pipeline stages, spending 40 cycles in the first stage, 30 in the second, and 20 in the last; (7) all other parameters have default values.

2. Four kinds of specifications can appear in a configuration file, according to the following syntax:

```
\label{eq:proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_proposed_
```

- **3.** A  $\langle$  PV spec  $\rangle$  simply assigns a given value to a given parameter. The possibilities for  $\langle$  parameter  $\rangle$  are as follows:
- fetchbuffer (default 4), maximum instructions in the fetch buffer; must be  $\geq 1$ .
- writebuffer (default 2), maximum octabytes in the write buffer; must be  $\geq 1$ .
- reorderbuffer (default 5), maximum instructions issued but not committed; must be  $\geq 1$ .
- renameregs (default 5), maximum partial results in the reorder buffer; must be  $\geq 1$ .

- memslots (default 2), maximum store instructions in the reorder buffer; must be > 1.
- localregs (default 256), number of local registers in ring; must be 256, 512, or 1024.
- fetchmax (default 2), maximum instructions fetched per cycle; must be  $\geq 1$ .
- dispatchmax (default 1), maximum instructions issued per cycle; must be  $\geq 1$ .
- peekahead (default 1), maximum lookahead for jumps per cycle.
- commitmax (default 1), maximum instructions committed per cycle; must be  $\geq 1$ .
- fremmax (default 1), maximum reductions in FREM computation per cycle; must be > 1.
- denin (default 1), extra cycles taken if a floating point input is subnormal.
- denout (default 1), extra cycles taken if a floating point result is subnormal.
- writeholdingtime (default 0), minimum number of cycles for data to remain in the write buffer.
- memaddresstime (default 20), cycles to process memory address; must be  $\geq 1$ .
- memreadtime (default 20), cycles to read one memory busload; must be  $\geq 1$ .
- memory busload; must be  $\geq 1$ .
- membusbytes (default 8), number of bytes per memory busload; must be a power of 2 that is 8 or more.
- branchpredictbits (default 0), number of bits in each branch prediction table entry; must be < 8.
- branchaddressbits (default 0), number of bits in instruction address used to index the branch prediction table.
- branchhistorybits (default 0), number of bits in branch history used to index the branch prediction table.
- branchdualbits (default 0), number of bits of instruction-address-xor-branchhistory used to index the branch prediction table.
- hardwarepagetable (default 1), is zero if page table calculations must be emulated by the operating system.
- disablesecurity (default 0), is 1 if the hot-seat security checks are turned off. This option is used only for testing purposes; it means that the 's' interrupt will not occur, and the 'p' interrupt will be signaled only when going from a nonnegative location to a negative one.
- memchunksmax (default 1000), maximum number of  $2^{16}$ -byte chunks of simulated memory; must be  $\geq 1$ .
- hashprime (default 2003), prime number used to address simulated memory; must exceed memchunksmax, preferably by a factor of about 2.

The values of memchunksmax and hashprime affect only the speed of the simulator, not its results—unless a very huge program is being simulated. The stated defaults for memchunksmax and hashprime should be adequate for almost all applications.

4. A  $\langle$  cache spec  $\rangle$  assigns a given value to a parameter affecting one of five possible caches:

The possibilities for (cache parameter) are as follows:

- associativity (default 1), number of cache blocks per cache set; must be a power of 2. (A cache with associativity 1 is said to be "direct-mapped.")
- blocksize (default 8), number of bytes per cache block; must be a power of 2, at least equal to the granularity, and at most equal to 8192. The blocksize of ITcache and DTcache must be 8.
- setsize (default 1), number of sets of cache blocks; must be a power of 2. (A cache with set size 1 is said to be "fully associative.")
- granularity (default 8), number of bytes per "dirty bit," used to remember which items of data have changed since they were read from memory; must be a power of 2 and at least 8. The granularity must be 8 if writeallocate is 0.
- victimsize (default 0), number of cache blocks in the victim buffer, which holds blocks removed from the main cache sets; must be zero or a power of 2.
- writeback (default 0), is 1 in a "write-back" cache, which holds dirty data as long as possible; is 0 in a "write-through" cache, which cleans all data as soon as possible.
- writeallocate (default 0), is 1 in a "write-allocate" cache, which remembers all recently written data; is 0 in a "write-around" cache, which doesn't make space for newly written data that fails to hit an existing cache block.
- accesstime (default 1), number of cycles to query the cache; must be  $\geq 1$ . (Hits in the S-cache actually require *twice* the accesstime, once to query the tag and once to transmit the data.)
- copyintime (default 1), number of cycles to move a cache block from its input buffer into the cache proper; must be  $\geq 1$ .
- copyouttime (default 1), number of cycles to move a cache block from the cache proper to its output buffer; must be  $\geq 1$ .
- ports (default 1), number of processes that can simultaneous query the cache; must be > 1.

The  $\langle \text{policy} \rangle$  parameter should be nonempty only on cache specifications for parameters associativity and victimsize. If no replacement policy is specified, random is the default. All four policies are equivalent when the associativity or victimsize is 1; pseudolru is equivalent to lru when the associativity or victimsize is 2.

The granularity, writeback, writeallocate, and copyouttime parameters affect the performance only of the D-cache and S-cache; the other three caches are read-only, so they never need to write their data.

The ports parameter affects the performance of the D-cache and DT-cache, and (if the PREGO command is used) the performance of the I-cache and IT-cache. The S-cache accommodates only one process at a time, regardless of the number of specified ports.

Only the translation caches (the IT-cache and DT-cache) are present by default. But if any specifications are given for, say, an I-cache, all of the unspecified I-cache parameters take their default values.

The existence of an S-cache (secondary cache) implies the existence of both I-cache and D-cache (primary caches for instructions and data). The block size of the secondary cache must not be less than the block size of the primary caches. The secondary cache must have the same granularity as the D-cache.

**5.** A  $\langle$  pipe spec  $\rangle$  governs the execution time of potentially slow operations.

```
\langle \text{ pipe spec} \rangle \longrightarrow \langle \text{ operation } \rangle \langle \text{ pipeline times } \rangle
\langle \text{ pipeline times } \rangle \longrightarrow \langle \text{ decimal value } \rangle \mid \langle \text{ pipeline times } \rangle \langle \text{ decimal value } \rangle
```

Here the  $\langle$  operation  $\rangle$  is one of the following:

- mul0 through mul8 (default 10); the values for mulj refer to products in which the second operand is less than  $2^{8j}$ , where j is as small as possible. Thus, for example, mul1 applies to nonzero one-byte multipliers.
- div (default 60); this applies to integer division, signed and unsigned.
- sh (default 1); this applies to left and right shifts, signed and unsigned.
- mux (default 1); the multiplex operator.
- sadd (default 1); the sideways addition operator.
- mor (default 1); the boolean matrix multiplication operators MOR and MXOR.
- fadd (default 4); floating point addition and subtraction.
- fmul (default 4); floating point multiplication.
- fdiv (default 40); floating point division.
- fsqrt (default 40); floating point square root.
- fint (default 4); floating point integerization.
- fix (default 2); conversion from floating to fixed, signed and unsigned.
- flot (default 2); conversion from fixed to floating, signed and unsigned.
- feps (default 4); floating comparison with respect to epsilon.

In each case one can specify a sequence of pipeline stages, with a positive number of cycles to be spent in each stage. For example, a specification like 'fmul 3 1' would say that a functional unit that supports FMUL takes a total of four cycles to compute the floating point product in two stages; it can start working on a second product after three cycles have gone by.

If a floating point operation has a subnormal input, denin is added to the time for the first stage. If a floating point operation has a subnormal result, denout is added to the time for the last stage. **6.** The fourth and final kind of specification defines a functional unit:

```
\langle \text{ functional spec} \rangle \longrightarrow \text{unit } \langle \text{ name} \rangle \langle 64 \text{ hexadecimal digits} \rangle
```

The symbolic name should be at most fifteen characters long. The 64 hexadecimal digits contain 256 bits, with '1' for each supported opcode; the most significant (leftmost) bit is for opcode 0 (TRAP), and the least significant bit is for opcode 255 (TRIP).

For example, we can define a load/store unit (which handles register/memory operations), a multiplication unit (which handles fixed and floating point multiplication), a boolean unit (which handles only bitwise operations), and a more general arithmetic-logical unit, as follows:

The order in which units are specified is important, because MMIX's dispatcher will try to match each instruction with the first functional unit that supports its opcode. Therefore it is best to list more specialized units (like the BIT unit in this example) before more general ones; this lets the specialized units have first chance at the instructions they can handle.

There can be any number of functional units, having possibly identical specifications. One should, however, give each unit a unique name (e.g., ALU1 and ALU2 if there are two arithmetic-logical units), since these names are used in diagnostic messages.

Opcodes that aren't supported by any specified unit will cause an emulation trap.

7. Full details about the significance of all these parameters can be found in the mmix-pipe module, which defines and discusses the data structures that need to be configured and initialized.

Of course the specifications in a configuration file needn't make any sense, nor need they be practically achievable. We could, for example, specify a unit that handles only the two opcodes NXOR and DIVUI; we could specify 1-cycle division but pipelined 100-cycle shifts, or 1-cycle memory access but 100-cycle cache access. We could create a thousand rename registers and issue a hundred instructions per cycle, etc. Some combinations of parameters are clearly ridiculous.

But there remain a huge number of possibilities of interest, especially as technology continues to evolve. By experimenting with configurations that are extreme by present-day standards, we can see how much might be gained if the corresponding hardware could be built economically.

**8.** Basic input/output. Let's get ready to program the *MMIX\_config* subroutine by building some simple infrastructure. First we need some macros to print error messages.

```
\#define errprint \theta(f) fprint f(stderr, f)
#define errprint1(f, a) fprintf(stderr, f, a)
#define errprint2(f, a, b) fprintf(stderr, f, a, b)
#define errprint3(f, a, b, c) fprintf(stderr, f, a, b, c)
#define panic(x) { x; errprint0("!\n"); exit(-1); }
    And we need a place to look at the input.
#define BUF_SIZE 100
                              /* we don't need long lines */
\langle \text{Global variables } 9 \rangle \equiv
                         /* input comes from here */
  FILE *config_file;
  char buffer[BUF_SIZE];
                              /* input lines go here */
  char token[BUF_SIZE];
                              /* and tokens are copied to here */
  char *buf\_pointer = buffer;
                                  /* this is our current position */
                              /* does token contain the next token already? */
  bool token_prescanned;
```

See also sections 15 and 28.

This code is used in section 38.

MMIX\_config: void (), §38.

stderr: FILE \*, <stdio.h>.

10. The *get\_token* routine copies the next token of input into the *token* buffer. After the input has ended, a final 'end' is appended.

```
\langle Subroutines 10\rangle \equiv
  static void get_token ARGS((void));
  static void qet_token()
                               /* set token to the next token of the configuration file */
     register char *p, *q;
    if (token_prescanned) {
       token\_prescanned = false; return;
     while (1) {
                      /* scan past white space */
       if (*buf\_pointer \equiv '\0' \lor *buf\_pointer \equiv '\n' \lor *buf\_pointer \equiv '\%') {
          if (\neg fgets(buffer, BUF\_SIZE, config\_file))  {
             strcpy(token, "end"); return;
          if (strlen(buffer) \equiv BUF\_SIZE - 1 \land buffer[BUF\_SIZE - 2] \neq '\n')
             panic(errprint1("config_file_line_too_long:_'%s...', buffer));
          buf\_pointer = buffer;
       } else if (\neg isspace(*buf\_pointer)) break;
       else buf_pointer++;
     for (p = buf\_pointer, q = token; \neg isspace(*p) \land *p \neq ',''; p++, q++) *q = *p;
     buf\_pointer = p; *q = '\0';
     return;
See also sections 11, 16, 22, 23, 30, and 31.
This code is used in section 38.
      The qet_int routine is called when we wish to input a decimal value. It returns
```

11. The get\_int routine is called when we wish to input a decimal value. It returns
−1 if the next token isn't a string of decimal digits.

12. A simple data structure makes it fairly easy to deal with parameter/value specifications.

```
\langle Type definitions 12\rangle
  typedef struct {
     char name[20]:
                         /* symbolic name */
              /* internal name */
     int defval:
                   /* default value */
                            /* minimum and maximum legal values */
     int minval, maxval;
     bool power_of_two;
                             /* must it be a power of two? */
  } pv_spec;
See also sections 13 and 14.
This code is used in section 38.
      Cache parameters are a bit more difficult, but still not bad.
\langle \text{Type definitions } 12 \rangle + \equiv
  typedef enum {
     assoc, blksz, setsz, gran, vctsz, wrb, wra, acctm, citm, cotm, prts
  } c_param;
  typedef struct {
     char name[20];
                         /* symbolic name */
     \mathbf{c}_{\mathbf{param}} v;
                     /* internal code */
                   /* default value */
     int minval, maxval;
                             /* minimum and maximum legal values */
                             /* must it be a power of two? */
     bool power_of_two:
  } cpv_spec;
14. Operation codes are the easiest of all.
\langle \text{Type definitions } 12 \rangle + \equiv
  typedef struct {
     char name[8];
                        /* symbolic name */
     internal_opcode v; /* internal code */
     int defval;
                   /* default value */
  } op_spec;
```

```
ARGS = macro(), MMIX-PIPE §6. errprint1 = macro(), §8.
                                                                    panic = macro(), \S 8.
bool = enum, MMIX-PIPE \S 11.
                                  false = 0, mmix-pipe §11.
                                                                    strcpy: char *(), <string.h>.
                                                                    strlen: size_t (), <string.h>.
buf\_pointer: \mathbf{char} *, \S 9.
                                  fqets: char *(), <stdio.h>.
BUF_SIZE = 100, \S 9.
                                  internal\_opcode = enum,
                                                                    token: char [], §9.
buffer: char [], §9.
                                  MMIX-PIPE §49.
                                                                    token_prescanned: bool, §9.
config_file: FILE *, \S9.
                                 isspace: int (), <ctype.h>.
```

15. Most of the parameters are external variables that are declared in the header file mmix-pipe.h; but some are private to this module. Here we define the main tables used below.

```
\langle Global \ variables \ 9 \rangle + \equiv
       int fetch_buf_size, write_buf_size, reorder_buf_size, mem_bus_butes, hardware_PT;
       int max\_cycs = 60;
       pv\_spec PV[] = {
       {"fetchbuffer", & fetch_buf_size, 4, 1, INT_MAX, false},
        {"writebuffer", & write_buf_size, 2, 1, INT_MAX, false},
        {"reorderbuffer", & reorder_buf_size, 5, 1, INT_MAX, false},
        {"renameregs", & max_rename_regs, 5, 1, INT_MAX, false},
        {"memslots", & max\_mem\_slots, 2, 1, INT_MAX, false},
        {"localregs", & lring_size, 256, 256, 1024, true},
        \{"fetchmax", \& fetch_max, 2, 1, INT_MAX, false\},\
        \{\text{"dispatchmax"}, \& dispatch\_max, 1, 1, INT\_MAX, false\},
        \{"peekahead", \&peekahead, 1, 0, INT_MAX, false\},\
        \{"commit_max", \&commit_max, 1, 1, INT_MAX, false\},\
        \{"fremmax", \& frem\_max, 1, 1, INT\_MAX, false\},\
        {"denin", & denin_penalty, 1, 0, INT_MAX, false},
        \{"denout", \& denout\_penalty, 1, 0, INT\_MAX, false\},\
        {"writeholdingtime", & holding_time, 0, 0, INT_MAX, false},
        {"memaddresstime", & mem\_addr\_time, 20, 1, INT_MAX, false},
        \{"memreadtime", \& mem\_read\_time, 20, 1, INT\_MAX, false\},\
        \{"memwritetime", \&mem\_write\_time, 20, 1, INT\_MAX, false\},\
        {"membusbytes", & mem\_bus\_bytes, 8, 8, INT_MAX, true},
        {"branchpredictbits", & bp_n, 0, 0, 8, false},
        {"branchaddressbits", & bp_a, 0, 0, 32, false},
        {"branchhistorybits", & bp_b, 0, 0, 32, false},
        {"branchdualbits", & bp_c, 0, 0, 32, false},
        \{\text{"hardware_}PT, 1, 0, 1, false\},\
        {"disablesecurity", (int *) & security_disabled, 0, 0, 1, false},
        \{"memchunksmax", & mem_chunks_max, 1000, 1, INT_MAX, false\},
        \{"hashprime", \& hash\_prime, 2003, 2, INT\_MAX, false\}\};
       cpv\_spec\ CPV[] = \{\{"associativity", assoc, 1, 1, INT\_MAX, true\},\
        {"blocksize", blksz, 8, 8, 8192, true},
        \{"setsize", setsz, 1, 1, INT_MAX, true\},\
        {"granularity", gran, 8, 8, 8192, true},
        \{"victimsize", vctsz, 0, 0, INT_MAX, true\},\
        \{"writeback", wrb, 0, 0, 1, false\},\
        \{"writeallocate", wra, 0, 0, 1, false\},\
        \{\text{"accesstime"}, acctm, 1, 1, INT\_MAX, false\},
        {"copyintime", citm, 1, 1, INT_MAX, false},
        {"copyouttime", cotm, 1, 1, INT_MAX, false},
        {"ports", prts, 1, 1, INT_MAX, false}};
       op\_spec\ OP[] = \{\{"mul0", mul0, 10\}, \{"mul1", mul1, 10\}, \{"mul2", mul2, 10\}, \{"mul3", mul3", mul3",
                      mul3, 10, {"mul4", mul4, 10}, {"mul5", mul5, 10}, {"mul6", mul6, 10}, {"mul7",
                      mul7, 10, {"mul8", mul8, 10},
                      \{"div", div, 60\}, \{"sh", sh, 1\}, \{"mux", mux, 1\}, \{"sadd", sadd, 1\}, \{"mor", mor, 1\}, \{"div", div, 60\}, \{"sh", sh, 1\}, \{"mux", mux, 1\}, \{"sadd", sadd, 1\}, \{"mor", mor, 1\}, \{"div", div, 60\}, \{"sh", sh, 1\}, \{"mux", mux, 1\}, \{"sadd", sadd, 1\}, \{"mor", mor, 1\}, \{"sadd", sadd, 1\}, \{"mor", mor, 1\}, \{"sadd", sadd, 1\}, \{"mor", mor, 1\}, \{"sadd", sadd, 1\}, \{"sa
```

```
{"fadd", fadd, 4}, {"fmul", fmul, 4}, {"fdiv", fdiv, 40}, {"fsqrt", fsqrt, 40},
     {"fint", fint, 4},
     {"fix", fix, 2}, {"flot", flot, 2}, {"feps", feps, 4}};
int PV_size, CPV_size, OP_size:
                                     /* the number of entries in PV, CPV, OP */
```

```
acctm = 7, \S 13.
assoc = 0, \S 13.
blksz = 1, \S 13.
bp_a: int, MMIX-PIPE §150.
bp_b: int, MMIX-PIPE §150.
bp\_c: int, MMIX-PIPE §150.
bp_n: int, MMIX-PIPE §150.
citm = 8, \S 13.
commit_max: int,
  MMIX-PIPE §59.
cotm = 9, \S 13.
cpv\_spec = struct, \S 13.
denin_penalty: int,
  MMIX-PIPE §349.
denout_penalty: int,
  MMIX-PIPE §349.
dispatch_max: int,
  MMIX-PIPE §59.
div = 9, mmix-pipe §49.
fadd = 14, mmix-pipe §49.
false = 0, mmix-pipe §11.
fdiv = 16, MMIX-PIPE §49.
feps = 21, MMIX-PIPE §49.
fetch_max: int, MMIX-PIPE §59.
fint = 18, MMIX-PIPE §49.
```

```
fix = 19, mmix-pipe §49.
flot = 20, mmix-pipe §49.
fmul = 15, mmix-pipe §49.
frem_max: int, MMIX-PIPE §349. mul2 = 2, MMIX-PIPE §49.
fsqrt = 17, MMIX-PIPE §49.
qran = 3, \S 13.
hash_prime: int,
  MMIX-PIPE §207.
holding_time: int,
 MMIX-PIPE \S 247.
INT_MAX = macro, <limits.h>.
lring_size: int, mmix-pipe §86.
max\_mem\_slots: int,
  MMIX-PIPE §86.
max_rename_regs: int,
 MMIX-PIPE §86.
mem_addr_time: int,
 MMIX-PIPE §214.
mem\_chunks\_max: int,
 MMIX-PIPE §207.
mem_read_time: int.
 мміх-ріре §214.
mem_write_time: int,
 MMIX-PIPE §214.
```

```
mor = 13, mmix-pipe §49.
mul\theta = 0, mmix-pipe §49.
mul1 = 1, mmix-pipe §49.
mul3 = 3, mmix-pipe §49.
mul_4 = 4, mmix-pipe §49.
mul5 = 5, mmix-pipe §49.
mul6 = 6, mmix-pipe §49.
mul7 = 7, mmix-pipe §49.
mul8 = 8, mmix-pipe §49.
mux = 11, MMIX-PIPE §49.
op\_spec = struct, \S 14.
peekahead: int, MMIX-PIPE §59.
prts = 10, \S 13.
pv\_spec = struct, §12.
sadd = 12, mmix-pipe §49.
security_disabled: bool,
 MMIX-PIPE §66.
setsz = 2, \S 13.
sh = 10, mmix-pipe §49.
true = 1, MMIX-PIPE §11.
vctsz = 4, \S 13.
wra = 6, \S 13.
wrb = 5, \S 13.
```

**16.** The *new\_cache* routine creates a **cache** structure with default values. (These default values are "hard-wired" into the program, not actually read from the *CPV* table.)

```
\langle Subroutines 10\rangle + \equiv
  static cache *new_cache ARGS((char *));
  static cache *new_cache(name)
        char *name;
   { register cache *c = (cache *) calloc(1, sizeof(cache));
     if (¬c) panic(errprint1("Can't, allocate, %s", name));
     c \rightarrow aa = 1:
                      /* default associativity, should equal CPV[0].defval */
     c \rightarrow bb = 8:
                      /* default blocksize */
     c \rightarrow cc = 1:
                    /* default setsize */
                      /* default granularity */
     c \rightarrow qq = 8;
                      /* default victimsize */
     c \rightarrow vv = 0;
     c \rightarrow repl = random;
                                /* default replacement policy */
     c \rightarrow vrepl = random:
                                 /* default victim replacement policy */
                        /* default mode is write-through and write-around */
     c \rightarrow mode = 0;
     c \rightarrow access\_time = c \rightarrow copy\_in\_time = c \rightarrow copy\_out\_time = 1;
     c \rightarrow filler.ctl = \&(c \rightarrow filler\_ctl);
     c \rightarrow filler\_ctl.ptr\_a = (\mathbf{void} *) c;
     c\rightarrow filler\_ctl.go.o.l = 4;
     c \rightarrow flusher.ctl = \&(c \rightarrow flusher\_ctl);
     c \rightarrow flusher\_ctl.ptr\_a = (\mathbf{void} *) c;
     c \rightarrow flusher\_ctl.go.o.l = 4:
     c \rightarrow ports = 1;
     c \rightarrow name = name;
     return c;
17. (Initialize to defaults 17) \equiv
   PV\_size = (sizeof PV)/sizeof(pv\_spec);
   CPV\_size = (sizeof \ CPV)/sizeof(cpv\_spec);
   OP\_size = (sizeof OP)/sizeof(op\_spec);
   ITcache = new\_cache("ITcache");
   DTcache = new\_cache("DTcache");
   Icache = Dcache = Scache = \Lambda;
  for (j = 0; j < PV\_size; j++) *(PV[j].v) = PV[j].defval;
  for (j = 0; j < OP\_size; j++) {
     pipe\_seq[OP[j].v][0] = OP[j].defval;
     pipe\_seq[OP[j].v][1] = 0; /* one stage */
```

This code is used in section 38.

18. Reading the specs. Before we're ready to process the configuration file, we need to count the number of functional units, so that we know how much space to allocate for them.

A special background unit is always provided, just to make sure that TRAP and TRIP instructions are handled by somebody.

```
\langle Count and allocate the functional units 18\rangle \equiv
  funit\_count = 0;
  while (strcmp(token, "end") \neq 0) {
     get_token();
     if (strcmp(token, "unit") \equiv 0) {
       funit\_count ++;
        get_token(); get_token();
                                     /* a unit might be named unit or end */
     }
  funit = (\mathbf{func} *) \ calloc(funit\_count + 1, \mathbf{sizeof}(\mathbf{func}));
  if (\neg funit) panic(errprint0("Can't_allocate_the_functional_units"));
  strcpy(funit[funit_count].name, "%%");
  funit[funit\_count].ops[0] = *80000000;
                                                 /* TRAP */
  funit[funit\_count].ops[7] = #1; /* TRIP */
This code is used in section 38.
```

```
aa: int, MMIX-PIPE §167.
access_time: int,
  MMIX-PIPE §167.
ARGS = macro (), MMIX-PIPE §6.
bb: int, MMIX-PIPE §167.
cache = struct
  MMIX-PIPE §167.
calloc: void *(), <stdlib.h>.
cc: int, MMIX-PIPE §167.
copy_in_time: int,
  MMIX-PIPE §167.
copy_out_time: int,
  MMIX-PIPE §167.
CPV: \mathbf{cpv\_spec} [], §15.
CPV_size: int, §15.
cpv\_spec = struct, \S 13.
ctl: control *, MMIX-PIPE §23.
Dcache: \mathbf{cache} *,
  MMIX-PIPE §168.
defval: int, \S14.
defval: int, §12.
DTcache: \mathbf{cache} *,
  MMIX-PIPE §168.
errprint\theta = macro(), \S 8.
errprint1 = macro(), \S 8.
```

```
filler: coroutine,
  MMIX-PIPE §167.
filler_ctl: control,
  MMIX-PIPE §167.
flusher: coroutine,
  MMIX-PIPE §167.
flusher_ctl: control,
 MMIX-PIPE §167.
func: struct, MMIX-PIPE §76.
funit: func *, MMIX-PIPE §77.
funit_count: int,
  MMIX-PIPE §77.
get_token: static void (), §10.
qq: int, MMIX-PIPE §167.
qo = 72, MMIX-PIPE §49.
Icache: cache *,
  MMIX-PIPE §168.
ITcache: cache *,
  MMIX-PIPE §168.
j: register int, §38.
l: tetra, MMIX-PIPE §17.
mode: int, MMIX-PIPE §167.
name: \mathbf{char} *, \text{MMIX-PIPE} \S 167.
name: \mathbf{char} [], MMIX-PIPE §76.
o: octa, mmix-pipe §40.
```

*OP*: **op\_spec** [], §15. *OP\_size*: **int**, §15.  $op\_spec = struct, \S 14.$  $ops: \mathbf{tetra}$  [], MMIX-PIPE §76.  $panic = macro(), \S 8.$ pipe\_seq: unsigned char [][], MMIX-PIPE §136. ports: int, MMIX-PIPE §167.  $ptr_a$ : void \*, MMIX-PIPE §44.  $PV: pv\_spec [], §15.$  $PV\_size$ : int, §15.  $pv\_spec = struct, \S 12.$ random = 0, mmix-pipe §164. repl: replace\_policy, MMIX-PIPE §167. Scache: cache \*, MMIX-PIPE §168. strcmp: int (), <string.h>. strcpy: char \*(), <string.h>. token: **char** [], §9. v: internal\_opcode, §14. v: int \*, §12.vrepl: replace\_policy, MMIX-PIPE §167. vv: int, MMIX-PIPE §167.

19. Now we can read the specifications and obey them. This program doesn't bother to be very tolerant of errors, nor does it try to be very efficient.

Incidentally, the specifications don't have to be broken into individual lines in any meaningful way. We simply read them token by token.

```
\langle \text{Record all the specs 19} \rangle \equiv
  rewind(config_file);
  funit\_count = 0:
  token[0] = '\0';
  while (strcmp(token, "end") \neq 0) {
     get_token();
    if (strcmp(token, "end") \equiv 0) break;
    (If token is a parameter name, process a PV spec 20);
    (If token is a cache name, process a cache spec 21);
    (If token is an operation name, process a pipe spec 24);
    if (strcmp(token, "unit") \equiv 0) (Process a functional spec 25);
    panic(errprint1("Configuration_syntax_error:_Specification_can't_start_with_\
         '%s'", token));
  }
This code is used in section 38.
20. (If token is a parameter name, process a PV spec 20) \equiv
  for (j = 0; j < PV\_size; j++)
    if (strcmp(token, PV[j].name) \equiv 0) {
       n = qet_int();
       if (n < PV[j].minval)
         panic(errprint2("Configuration||error:|,%s||must||be||>=|,%d", PV[i]|.name,
              PV[j].minval);
       if (n > PV[j].maxval)
         PV[j].maxval);
       if (PV[j].power\_of\_two \land (n \& (n-1)))
         panic(errprint1("Configuration\_error: \_\%s\_must\_be\_a\_power\_of_\_2",
              PV[j].name));
       *(PV[j].v) = n;
       break;
  if (j < PV\_size) continue;
This code is used in section 19.
```

```
21. (If token is a cache name, process a cache spec 21) \equiv
  if (strcmp(token, "ITcache") \equiv 0) {
     pcs(ITcache); continue;
  } else if (strcmp(token, "DTcache") \equiv 0) {
     pcs(DTcache); continue;
  } else if (strcmp(token, "Icache") \equiv 0) {
     if (\neg Icache) Icache = new\_cache("Icache");
     pcs(Icache); continue;
  } else if (strcmp(token, "Dcache") \equiv 0) {
     if (\neg Dcache) Dcache = new\_cache("Dcache");
     pcs(Dcache); continue;
  } else if (strcmp(token, "Scache") \equiv 0) {
     if (\neg Icache) Icache = new\_cache("Icache");
     if (\neg Dcache) Dcache = new\_cache("Dcache"):
     if (\neg Scache) Scache = new\_cache("Scache");
     pcs(Scache); continue;
This code is used in section 19.
22. \langle Subroutines 10\rangle + \equiv
  static void ppol ARGS((replace_policy *));
  static void ppol(rr)
                             /* subroutine to scan for a replacement policy */
       replace_policy *rr;
     get_token();
     if (strcmp(token, "random") \equiv 0) *rr = random;
     else if (strcmp(token, "serial") \equiv 0) *rr = serial;
     else if (strcmp(token, "pseudolru") \equiv 0) *rr = pseudo_lru;
     else if (strcmp(token, "lru") \equiv 0) *rr = lru;
     else token\_prescanned = true;
                                        /* oops, we should rescan that token */
```

```
ARGS = macro(), MMIX-PIPE §6. ITcache: cache *,
                                                                    PV: pv\_spec [], §15.
config\_file: FILE *, \S9.
                                    MMIX-PIPE §168.
                                                                    PV_size: int, §15.
Dcache: \mathbf{cache} *,
                                  j: register int, §38.
                                                                    random = 0, mmix-pipe §164.
                                  lru = 3, mmix-pipe §164.
                                                                    replace_policy = enum,
  MMIX-PIPE §168.
                                  maxval: int, \S 12.
DTcache: cache *,
                                                                      MMIX-PIPE §164.
  MMIX-PIPE §168.
                                  minval: int, §12.
                                                                    rewind: void (), <stdio.h>.
errprint1 = macro(), \S 8.
                                  n: register int, §38.
                                                                    Scache: \mathbf{cache} *,
errprint2 = macro(), \S 8.
                                  name: char [], §12.
                                                                      MMIX-PIPE §168.
funit_count: int,
                                  new_cache: static cache *(),
                                                                    serial = 1, mmix-pipe §164.
  MMIX-PIPE §77.
                                    §16.
                                                                    strcmp: int (), <string.h>.
                                  panic = macro(), \S 8.
get_int: static int (), §11.
                                                                    token: char [], §9.
get_token: static void (), §10.
                                  pcs: static void (), §23.
                                                                    token_prescanned: bool, §9.
Icache: cache *,
                                  power_of_two: bool, §12.
                                                                    true = 1, mmix-pipe §11.
  MMIX-PIPE §168.
                                  pseudo\_lru = 2, mmix-pipe §164. v: int *, §12.
```

```
23.
     \langle \text{Subroutines } 10 \rangle + \equiv
  static void pcs ARGS((cache *));
  static void pcs(c)
                          /* subroutine to process a cache spec */
        cache *c:
     register int i, n;
     get_token();
     for (j = 0; j < CPV\_size; j++)
        if (strcmp(token, CPV[j].name) \equiv 0) break;
     if (j \equiv CPV\_size) panic(errprint1("Configuration|syntax|error:|'%s'|isn't|\)
             a_{\sqcup} cache_{\sqcup} parameter_{\sqcup} name'', token));
     n = qet\_int();
     if (n < CPV[i].minval)
        panic(errprint2("Configuration||error:|,%s||must||be||>=|,%d", CPV[i].name,
             CPV[j].minval);
     if (n > CPV[j].maxval)
        panic(errprint2("Configuration_error: _\'\%s_\must_\be_\<=_\'\%d", <math>CPV[j].name,
              CPV[j].maxval));
     if (CPV[j].power\_of\_two \land (n \& (n-1)))
        panic(errprint1("Configuration|error: |%s|must|be|power|of|2", CPV[j].name));
     switch (CPV[j].v) {
     case assoc: c \rightarrow aa = n; ppol(\&(c \rightarrow repl)); break;
     case blksz: c \rightarrow bb = n; break;
     case setsz: c \rightarrow cc = n; break;
     case gran: c \rightarrow gg = n; break;
     case vctsz: c \rightarrow vv = n; ppol(\&(c \rightarrow vrepl)); break;
     case wrb: c \rightarrow mode = (c \rightarrow mode \& \sim WRITE\_BACK) + n * WRITE\_BACK; break;
     case wra: c \rightarrow mode = (c \rightarrow mode \& \sim WRITE\_ALLOC) + n * WRITE\_ALLOC; break;
     case acctm: if (n > max\_cycs) max\_cycs = n;
        c \rightarrow access\_time = n; break;
     case citm: if (n > max\_cycs) max\_cycs = n;
        c \rightarrow copy\_in\_time = n; break;
     case cotm: if (n > max\_cycs) max\_cycs = n;
        c \rightarrow copy\_out\_time = n; break;
     case prts: c \rightarrow ports = n; break;
  }
24. (If token is an operation name, process a pipe spec 24) \equiv
  for (j = 0; j < OP\_size; j++)
     if (strcmp(token, OP[j].name) \equiv 0) {
        for (i = 0; ; i++) {
          n = get\_int();
          if (n < 0) break;
          if (n \equiv 0) panic(errprint\theta("Configuration_error:_Pipeline_cycles_mu)
                   st_be_positive"));
          if (n > 255)
             panic(errprint0("Configuration_error:_Pipeline_cycles_must_be_<=_255"));
          if (n > max\_cycs) max\_cycs = n;
```

```
 \begin{array}{l} \textbf{if } (i \geq pipe\_limit) \\ panic(errprint1("Configuration\_error:\_More\_than\_%d\_pipeline\_stages", \\ pipe\_limit)); \\ pipe\_seq[OP[j].v][i] = n; \\ \\ \\ \\ token\_prescanned = true; \\ \\ \textbf{break}; \\ \\ \\ \\ \\ \textbf{if } (j < OP\_size) \ \textbf{continue}; \end{array}  This code is used in section 19.
```

\_\_\_\_\_\_

```
aa: int, MMIX-PIPE §167.
access_time: int,
  MMIX-PIPE §167.
acctm = 7, \S 13.
ARGS = macro (), MMIX-PIPE §6.
assoc = 0, \S 13.
bb: int, MMIX-PIPE §167.
blksz = 1, \S 13.
cache = struct,
  MMIX-PIPE §167.
cc: int, MMIX-PIPE §167.
citm = 8, §13.
copy_in_time: int,
  MMIX-PIPE §167.
copy_out_time: int,
  \operatorname{MMIX-PIPE} \S 167.
cotm = 9, \S 13.
CPV: cpv_spec [], §15.
CPV_size: int, §15.
errprint0 = macro(), \S 8.
errprint1 = macro(), \S 8.
errprint2 = macro(), \S 8.
```

```
qet_int: static int (), §11.
get_token: static void (), §10.
gg: \mathbf{int}, \text{MMIX-PIPE} \S 167.
qran = 3, \S 13.
i: register int, §38.
j: register int, §38.
max\_cycs: int, §15.
maxval: int, §13.
minval: int, §13.
mode: int, MMIX-PIPE §167.
n: register int, §38.
name: char [], §13. name: char [], §14.
OP: \mathbf{op\_spec} [], \S 15.
OP_size: int, §15.
panic = macro(), \S 8.
pipe\_limit = 90, MMIX-PIPE §136. wrb = 5, §13.
pipe_seq: unsigned char [][],
  MMIX-PIPE §136.
ports: int, MMIX-PIPE §167.
power_of_two: bool, §13.
```

```
ppol: static void (), §22.
prts = 10, \S 13.
repl: replace_policy,
 MMIX-PIPE §167.
setsz = 2, \S 13.
strcmp: int (), <string.h>.
token: char [], §9.
token_prescanned: bool, §9.
true = 1, mmix-pipe §11.
v: c_param, §13.
v: internal_opcode, §14.
vctsz = 4, \S 13.
vrepl: replace_policy,
 MMIX-PIPE §167.
vv: int, MMIX-PIPE §167.
wra = 6, \S 13.
WRITE\_ALLOC = 2,
 MMIX-PIPE §166.
WRITE_BACK = 1,
 MMIX-PIPE §166.
```

```
25.
      \langle \text{ Process a functional spec 25} \rangle \equiv
     get_token();
     if (strlen(token) > 15)
        panic(errprint1("Configuration_error:_'%s'__is__more_than_15_characters_long",
             token));
     strcpy(funit[funit_count].name, token);
     get_token();
     if (strlen(token) \neq 64)
        panic(errprint1("Configuration\_error:\_unit\_\%s\_doesn't\_ha)
             ve_{\sqcup}64_{\sqcup}hex_{\sqcup}digit_{\sqcup}specs", funit[funit\_count].name));
     for (i = j = n = 0; j < 64; j++)
        if (token[j] \ge 0, \land token[j] \le 9,  n = (n \ll 4) + (token[j] - 0, ;
        else if (token[j] > 'a' \land token[j] < 'f') n = (n \ll 4) + (token[j] - 'a' + 10);
        else if (token[j] \geq `A` \wedge token[j] \leq `F`) n = (n \ll 4) + (token[j] - `A` + 10);
        else
          panic(errprint1("Configuration uerror: "'%c'uisunot auhexudigit", token[j]));
        if ((j \& ^{\#}7) \equiv ^{\#}7) funit[funit_count].ops[i++] = n, n = 0;
     funit\_count ++;
     continue:
```

This code is used in section 19.

**26.** Checking and allocating. The battle is only half over when we've absorbed all the data of the configuration file. We still must check for interactions between different quantities, and we must allocate space for cache blocks, coroutines, etc.

One of the most difficult tasks facing us is to determine the maximum number of pipeline stages needed by each functional unit. Let's tackle that first.

```
\langle Allocate coroutines in each functional unit 26\rangle \equiv
   (Build table of pipeline stages needed for each opcode 27);
  for (j = 0; j \leq funit\_count; j++) {
     \langle Determine the number of stages, n, needed by funit[j] 29\rangle;
     funit[j].k = n;
     funit[j].co = (coroutine *) calloc(n, sizeof(coroutine));
     for (i = 0; i < n; i++) {
       funit[j].co[i].name = funit[j].name;
       funit[j].co[i].stage = i + 1;
This code is used in section 38.
27. (Build table of pipeline stages needed for each opcode 27) \equiv
  for (j = div; j < max\_pipe\_op; j++) int\_stages[j] = strlen(pipe\_seg[j]);
  for (; j < max\_real\_command; j \leftrightarrow) int\_stages[j] = 1;
  for (j = mul0, n = 0; j \le mul8; j++)
     if (strlen(pipe\_seq[j]) > n) n = strlen(pipe\_seq[j]);
  int\_stages[mul] = n;
```

 $int\_stages[ld] = int\_stages[st] = int\_stages[frem] = 2;$  $for (j = 0; j < 256; j ++) stages[j] = int\_stages[int\_op[j]];$ 

This code is used in section 26.

```
calloc: void *(), <stdlib.h>.
co: coroutine *,
    MMIX-PIPE §76.
coroutine = struct,
    MMIX-PIPE §23.
div = 9, MMIX-PIPE §49.
errprint1 = macro (), §8.
frem = 25, MMIX-PIPE §49.
funit: func *, MMIX-PIPE §77.
funit_count: int,
    MMIX-PIPE §77.
get_token: static void (), §10.
i: register int, §38.
```

```
int_op: internal_opcode [], §28.
int_stages: int [], §28.
j: register int, §38.
k: int, MMIX-PIPE §76.
ld = 56, MMIX-PIPE §49.
max_pipe_op = feps,
MMIX-PIPE §49.
max_real_command = trip,
MMIX-PIPE §49.
mul = 26, MMIX-PIPE §49.
mul0 = 0, MMIX-PIPE §49.
mul8 = 8, MMIX-PIPE §49.
```

**28.** The *int\_op* conversion table is similar to the *internal\_op* array of the *MMIX\_run* routine, but it replaces *divu* by *div*, *fsub* by *fadd*, etc.

```
\langle Global \ variables \ 9 \rangle + \equiv
  internal_opcode int\_op[256] = \{
  trap, fcmp, funeq, funeq, fadd, fix, fadd, fix,
  flot, flot, flot, flot, flot, flot, flot, flot,
  fmul, feps, feps, feps, fdiv, fsqrt, frem, fint,
  mul, mul, mul, div, div, div, div,
  add, add, addu, addu, sub, sub, subu, subu,
  addu, addu, addu, addu, addu, addu, addu, addu,
  cmp, cmp, cmpu, cmpu, sub, sub, subu, subu,
  sh, sh, sh, sh, sh, sh, sh, sh
  br, br, br, br, br, br, br, br,
  br, br, br, br, br, br, br, br,
  pbr, pbr, pbr, pbr, pbr, pbr, pbr, pbr,
  pbr, pbr, pbr, pbr, pbr, pbr, pbr, pbr,
  cset, cset, cset, cset, cset, cset, cset,
  cset, cset, cset, cset, cset, cset, cset,
  zset, zset, zset, zset, zset, zset, zset,
  zset, zset, zset, zset, zset, zset, zset, zset,
  ld, ld, ld, ld, ld, ld, ld, ld,
  ld, ld, ld, ld, ld, ld, ld, ld,
  ld, ld, ld, ld, ld, ld, ld, ld,
  ld, ld, ld, ld, prego, prego, go, go,
  st, st, st, st, st, st, st, st,
  st, st, st, st, st, st, st, st
  st, st, st, st, st, st, st, st
  st, st, st, st, st, st, pushgo, pushgo,
  or, or, orn, orn, nor, nor, xor, xor,
  and, and, andn, andn, nand, nand, nxor, nxor,
  bdif, bdif, wdif, wdif, tdif, tdif, odif, odif,
  mux, mux, sadd, sadd, mor, mor, mor, mor,
  set, set, set, set, addu, addu, addu, addu,
  or, or, or, or, andn, andn, andn, andn,
  noop, noop, pushj, pushj, set, set, put, put,
  pop, resume, save, unsave, sync, noop, get, trip \};
  int int\_stages[max\_real\_command + 1];
                                                 /* stages as function of internal_opcode */
  int stages [256];
                        /* stages as function of mmix_opcode */
       \langle Determine the number of stages, n, needed by funit[j] 29 \rangle \equiv
  for (i = n = 0; i < 256; i++)
     if (((funit[j].ops[i \gg 5] \ll (i \& ^{\#}1f)) \& ^{\#}80000000) \wedge stages[i] > n) \ n = stages[i];
  if (n \equiv 0) panic(errprint1("Configuration_error:_uunit_\%s_doesn't_do_anything",
          funit[j].name));
```

This code is used in section 26.

**30.** The next hardest thing on our agenda is to set up the cache structure fields that depend on the parameters. For example, although we have defined the parameter in the bb field (the block size), we also need to compute the b field (log of the block size), and we must create the cache blocks themselves.

```
 \begin{array}{ll} \langle \, \text{Subroutines} \,\, 10 \, \rangle \, + \equiv \\ & \text{ static int } \, lg \,\, \text{ARGS}((\text{int})); \\ & \text{ static int } \, lg \, (n) \quad / * \,\, \text{ compute binary logarithm } */ \\ & \text{ int } \, n; \\ \{ \,\, \text{ register int } \, j, \,\, l; \\ & \text{ for } \, (j=n,l=0; \,\, j; \,\, j \gg = 1) \,\, l + +; \\ & \text{ return } \, l-1; \\ \} \end{array}
```

add = 29, mmix-pipe §49. addu = 30, mmix-pipe §49. and = 37, MMIX-PIPE §49. andn = 38, mmix-pipe §49. ARGS = macro (), MMIX-PIPE §6.b: **int**, MMIX-PIPE §167. bb: int, MMIX-PIPE §167. bdif = 48, mmix-pipe §49. br = 69, mmix-pipe §49. cmp = 46, MMIX-PIPE §49. cmpu = 47, mmix-pipe §49. cset = 53, MMIX-PIPE §49. div = 9, mmix-pipe §49. divu = 28, MMIX-PIPE §49.  $errprint1 = macro(), \S 8.$ fadd = 14, mmix-pipe §49. fcmp = 22, MMIX-PIPE §49. fdiv = 16, mmix-pipe §49. feps = 21, mmix-pipe §49. fint = 18, MMIX-PIPE §49. fix = 19, mmix-pipe §49. flot = 20, mmix-pipe §49. fmul = 15, mmix-pipe §49. frem = 25, mmix-pipe §49. fsqrt = 17, mmix-pipe §49. fsub = 24, mmix-pipe §49. funeq = 23, MMIX-PIPE §49.

funit: func \*, MMIX-PIPE §77. qet = 54, mmix-pipe §49. qo = 72, MMIX-PIPE §49. i: register int, §38. internal\_op: internal\_opcode [], MMIX-PIPE §51.  $internal\_opcode = enum$ , MMIX-PIPE §49. j: register int, §38. ld = 56, mmix-pipe §49.  $max\_real\_command = trip$ , MMIX-PIPE §49.  $mmix\_opcode = enum$ , MMIX-PIPE §47.  $MMIX\_run:$  void (), MMIX-PIPE §10. mor = 13, mmix-pipe §49. mul = 26, mmix-pipe §49. mux = 11, mmix-pipe §49. n: register int, §38. name: char [], MMIX-PIPE §76. nand = 39, mmix-pipe §49. noop = 81, MMIX-PIPE §49. nor = 36, MMIX-PIPE §49. nxor = 41, mmix-pipe §49. odif = 51, mmix-pipe §49. ops: tetra [], MMIX-PIPE §76.

or = 34, mmix-pipe §49. orn = 35, MMIX-PIPE §49.  $panic = macro(), \S 8.$ pbr = 70, mmix-pipe §49. pop = 75, MMIX-PIPE §49. prego = 73, MMIX-PIPE §49. pushqo = 74, MMIX-PIPE §49. pushj = 71, MMIX-PIPE §49. put = 55, mmix-pipe §49. resume = 76, MMIX-PIPE §49. sadd = 12, mmix-pipe §49. save = 77, mmix-pipe §49. set: cacheset \*. MMIX-PIPE §167. sh = 10, mmix-pipe §49. st = 63, mmix-pipe §49. sub = 31, mmix-pipe §49. subu = 32, mmix-pipe §49. sync = 79, mmix-pipe §49. tdif = 50, mmix-pipe §49. trap = 82, mmix-pipe §49. trip = 83, mmix-pipe §49. unsave = 78, mmix-pipe §49. wdif = 49, mmix-pipe §49. xor = 40, mmix-pipe §49. zset = 52, mmix-pipe §49.

```
31. \langle Subroutines 10\rangle + \equiv
   static void alloc_cache ARGS((cache *, char *));
   static void alloc_cache(c, name)
          cache *c:
          char *name;
   { register int j, k;
      if (c \rightarrow bb < c \rightarrow qq) panic (errprint1 ("Configuration||error:||blocksize||of||%s||is)
                | less | than | granularity", name));
      if (name[1] \equiv T' \land c \rightarrow bb \neq 8)
          panic(errprint1("Configuration|error:|blocksize|of|%s|must|be|8",name));
      c \rightarrow a = lq(c \rightarrow aa);
      c \rightarrow b = lq(c \rightarrow bb);
      c \rightarrow c = lg(c \rightarrow cc);
      c \rightarrow g = lg(c \rightarrow gg);
      c \rightarrow v = lq(c \rightarrow vv);
      c \rightarrow tagmask = -(1 \ll (c \rightarrow b + c \rightarrow c));
      if (c \rightarrow a + c \rightarrow b + c \rightarrow c \ge 32)
          panic(errprint1("Configuration_error: \_\%s_has_>=_14_gigabytes_of_data",
                name));
      if (c \rightarrow qq \neq 8 \land \neg (c \rightarrow mode \& WRITE\_ALLOC)) panic(errprint2("Configuration|error\
                 \langle Allocate the cache sets for cache c 32\rangle;
      if (c \rightarrow vv) \( \text{Allocate the victim cache for cache } c \) 33 \);
      c \rightarrow inbuf.dirty = (\mathbf{char} *) calloc(c \rightarrow bb \gg c \rightarrow g, \mathbf{sizeof}(\mathbf{char}));
      if (\neg c \rightarrow inbuf.dirty)
          panic(errprint1("Can't||allocate||dirty||bits||for||inbuffer||of||%s", name));
      c \rightarrow inbuf.data = (\mathbf{octa} *) \ calloc(c \rightarrow bb \gg 3, \mathbf{sizeof}(\mathbf{octa}));
      if (\neg c \rightarrow inbuf.data)
          panic(errprint1("Can't_lallocate_ldata_for_linbuffer_lof_l%s", name));
      c \rightarrow outbuf.dirty = (\mathbf{char} *) calloc(c \rightarrow bb \gg c \rightarrow g, \mathbf{sizeof}(\mathbf{char}));
      if (\neg c \rightarrow outbuf.dirty)
          panic(errprint1("Can'tuallocate_dirty_bits_for_outbuffer_of_%s", name));
      c \rightarrow outbuf.data = (\mathbf{octa} *) \ calloc(c \rightarrow bb \gg 3, \mathbf{sizeof}(\mathbf{octa}));
      if (\neg c \rightarrow outbuf.data)
          panic(errprint1("Can't_allocate_data_for_outbuffer_of_ks", name));
      if (name[0] \neq 'S') (Allocate reader coroutines for cache c 34);
   }
32.
        #define sign_bit #80000000
\langle Allocate the cache sets for cache c 32\rangle \equiv
   c \rightarrow set = (\mathbf{cacheset} *) \ calloc(c \rightarrow cc, \mathbf{sizeof}(\mathbf{cacheset}));
   if (\neg c \neg set) panic(errprint1("Can't_uallocate_ucache_usets_ufor_u%s", name));
   for (j = 0; j < c \rightarrow cc; j++) {
      c \rightarrow set[j] = (cacheblock *) calloc(c \rightarrow aa, sizeof(cacheblock));
      if (\neg c \rightarrow set[j])
          panic(errprint2("Can't||allocate||cache||blocks||for||set||%d||of||%s", j, name));
      for (k = 0; k < c \rightarrow aa; k++) {
          c \rightarrow set[j][k].tag.h = sign\_bit;
                                                    /* invalid tag */
          c \rightarrow set[j][k].dirty = (\mathbf{char} *) calloc(c \rightarrow bb \gg c \rightarrow g, \mathbf{sizeof}(\mathbf{char}));
```

```
if (\neg c \rightarrow set[j][k].dirty)
              panic(errprint3("Can'tuallocate_dirty_bits_for_block_%d_of_set_%d_of_%s",
                      k, j, name);
           c \rightarrow set[j][k].data = (\mathbf{octa} *) calloc(c \rightarrow bb \gg 3, \mathbf{sizeof}(\mathbf{octa}));
           if (\neg c \rightarrow set[j][k].data)
              panic(errprint3("Can't_{\parallel}allocate_{\parallel}data_{\parallel}for_{\parallel}block_{\parallel}%d_{\parallel}of_{\parallel}set_{\parallel}%d_{\parallel}of_{\parallel}%s", k, j,
                      name)):
       }
This code is used in section 31.
         \langle Allocate the victim cache for cache c 33 \rangle \equiv
       c \rightarrow victim = (\mathbf{cacheblock} *) \ calloc(c \rightarrow vv, \mathbf{sizeof}(\mathbf{cacheblock}));
       if (\neg c \rightarrow victim)
           panic(errprint1("Can't, allocate, blocks, for, victim, cache, of, %s", name));
       for (k = 0; k < c \rightarrow vv; k++) {
           c \rightarrow victim[k].tag.h = sign\_bit;
                                                           /* invalid tag */
           c \rightarrow victim[k].dirty = (\mathbf{char} *) calloc(c \rightarrow bb \gg c \rightarrow g, \mathbf{sizeof}(\mathbf{char}));
           if (\neg c \rightarrow victim[k].dirty)
              panic(errprint2("Can'tuallocateudirtyubitsuforublocku%du\
                      of_{\sqcup}victim_{\sqcup}cache_{\sqcup}of_{\sqcup}%s", k, name));
           c \rightarrow victim[k].data = (\mathbf{octa} *) calloc(c \rightarrow bb \gg 3, \mathbf{sizeof}(\mathbf{octa}));
           if (\neg c \rightarrow victim[k].data)
              panic(errprint2("Can't_uallocate_udata_ufor_ublock_u%d_uof_uvictim_ucache_uof_u%s",
                      k, name));
```

This code is used in section 31.

```
\begin{array}{l} panic = \text{macro} \; ( \, ), \; \S 8. \\ set: \; \textbf{cacheset} \; *, \\ & \text{MMIX-PIPE} \; \S 167. \\ tag: \; \textbf{octa}, \; \text{MMIX-PIPE} \; \S 167. \\ tagmask: \; \textbf{int}, \; \text{MMIX-PIPE} \; \S 167. \\ v: \; \textbf{int}, \; \text{MMIX-PIPE} \; \S 167. \\ victim: \; \textbf{cacheset}, \\ & \text{MMIX-PIPE} \; \S 167. \\ vv: \; \textbf{int}, \; \text{MMIX-PIPE} \; \S 167. \\ \text{WRITE\_ALLOC} = 2, \\ & \text{MMIX-PIPE} \; \S 166. \\ \end{array}
```

```
34.
     \langle Allocate reader coroutines for cache c 34\rangle \equiv
     c \rightarrow reader = (\mathbf{coroutine} *) \ calloc(c \rightarrow ports, \mathbf{sizeof}(\mathbf{coroutine}));
     if (¬c¬reader) panic(errprint1("Can't|allocate||readers||for||%s", name));
     for (j = 0; j < c \rightarrow ports; j ++) {
        c \rightarrow reader[j].stage = vanish;
        c-reader[j].name = (name[0] \equiv 'D', ? (name[1] \equiv 'T', ? "DTreader" : "Dreader") :
             (name[1] \equiv 'T' ? "ITreader" : "Ireader"));
This code is used in section 31.
35. \langle Allocate the caches 35\rangle \equiv
   alloc_cache(ITcache, "ITcache");
  ITcache→filler.name = "ITfiller"; ITcache→filler.stage = fill_from_virt;
   alloc_cache(DTcache, "DTcache");
  DTcache \neg filler.name = "DTfiller"; DTcache \neg filler.stage = fill\_from\_virt;
  if (Icache) {
     alloc_cache (Icache, "Icache");
     Icache \neg filler.name = "Ifiller"; Icache \neg filler.stage = fill\_from\_mem;
  if (Dcache) {
     alloc_cache(Dcache, "Dcache");
     Dcache→filler.name = "Dfiller"; Dcache→filler.stage = fill_from_mem;
     Dcache-flusher.name = "Dflusher"; Dcache-flusher.stage = flush_to_mem;
  if (Scache) {
     alloc_cache(Scache, "Scache");
     if (Scache \rightarrow bb < Icache \rightarrow bb) panic(errprint0("Configuration_error))
              : _Scache_blocks_smaller_than_Icache_blocks"));
     if (Scache \rightarrow bb < Dcache \rightarrow bb) panic(errprint0("Configuration_error))
              : _Scache_blocks_smaller_than_Dcache_blocks"));
     if (Scache \neg gg \neq Dcache \neg gg) panic(errprint0("Configuration|error))
              : ||Scache||granularity||differs||from||the||Dcache"));
     Icache \rightarrow filler.stage = fill\_from\_S;
     Dcache \neg filler.stage = fill from S; Dcache \neg flusher.stage = flush to S;
     Scache \neg filler.name = "Sfiller"; Scache \neg filler.stage = fill\_from\_mem;
     Scache→flusher.name = "Sflusher"; Scache→flusher.stage = flush_to_mem;
  }
This code is used in section 38.
```

**36.** Now we are nearly done. The only nontrivial task remaining is to allocate the ring of queues for coroutine scheduling; for this we need to determine the maximum waiting time that will occur between scheduler and schedulee.

```
\langle Allocate the scheduling queue 36 \rangle \equiv bus\_words = mem\_bus\_bytes \gg 3; j = (mem\_read\_time < mem\_write\_time ? mem\_write\_time : mem\_read\_time); n = 1; if (Scache \wedge Scache \neg bb > n) n = Scache \neg bb;
```

```
if (Icache \lambda Icache\to bb > n) n = Icache\to bb;
if (Dcache \lambda Dcache\to bb > n) n = Dcache\to bb;
n = mem_addr_time + ((int)(n + bus_words - 1)/bus_words) * j;
if (n > max_cycs) max_cycs = n; /* now max_cycs bounds the waiting time */
ring_size = max_cycs + 1;
ring = (coroutine *) calloc(ring_size, sizeof(coroutine));
if (\tauring) panic(errprint0("Can't_\to allocate_\to the_\to scheduling_\to ring"));
{ register coroutine *p;
    for (p = ring; p < ring + ring_size; p++) {
        p\to name = \text{""; /* header nodes are nameless */}
        p\to stage = max_stage;
    }
}</pre>
```

This code is used in section 38.

```
alloc_cache: static void (),
                                   MMIX-PIPE §129.
                                                                    MMIX-PIPE §214.
                                                                  mem\_bus\_bytes: int, \S 15.
                                 filler: coroutine,
bb: int, MMIX-PIPE §167.
                                   MMIX-PIPE §167.
                                                                  mem_read_time: int,
bus_words: int,
                                 flush\_to\_mem = 97,
                                                                    MMIX-PIPE §214.
 MMIX-PIPE §214.
                                   MMIX-PIPE §129.
                                                                  mem_write_time: int,
c: cache *, §31.
                                 flush\_to\_S = 96,
                                                                    MMIX-PIPE §214.
calloc: void *(), <stdlib.h>.
                                   MMIX-PIPE §129.
                                                                  n: register int, §38.
coroutine = struct,
                                 flusher: coroutine,
                                                                  name: char *, §31.
  MMIX-PIPE §23.
                                  MMIX-PIPE \S 167.
                                                                  name: char ∗, MMIX-PIPE §23.
Dcache: \mathbf{cache} *,
                                 gg: int, MMIX-PIPE §167.
                                                                  panic = macro(), \S 8.
  MMIX-PIPE §168.
                                 Icache: cache *,
                                                                  ports: int, MMIX-PIPE §167.
DTcache:  cache *,
                                  MMIX-PIPE §168.
                                                                  reader: coroutine *,
  MMIX-PIPE §168.
                                 ITcache: cache *,
                                                                    MMIX-PIPE §167.
errprint0 = macro(), \S 8.
                                  MMIX-PIPE §168.
                                                                  ring: coroutine *,
errprint1 = macro(), \S 8.
                                 j: register int, §31.
                                                                    MMIX-PIPE §29.
                                                                  ring\_size: int, MMIX-PIPE §29.
fill\_from\_mem = 95,
                                 j: register int, §38.
  MMIX-PIPE §129.
                                 max\_cycs: int, §15.
                                                                  Scache: cache *,
fill\_from\_S = 94,
                                 max\_stage = 99,
                                                                    MMIX-PIPE §168.
  MMIX-PIPE §129.
                                 мміх-ріре §129.
                                                                  stage: int, MMIX-PIPE §23.
fill\_from\_virt = 93,
                                 mem_addr_time: int,
                                                                  vanish = 98, MMIX-PIPE §129.
```

```
37. \langle Touch up last-minute trivia 37 \rangle \equiv
  if (hash\_prime \leq mem\_chunks\_max)
     panic(errprint()("Configuration, error:, hashprime, must, exceed, memchunksmax"));
  mem\_hash = (chunknode *) calloc(hash\_prime + 1, sizeof(chunknode));
  if (¬mem_hash) panic(errprint0("Can'tuallocateutheuhashutable"));
  mem\_hash[0].chunk = (\mathbf{octa} *) calloc(1 \ll 13, \mathbf{sizeof}(\mathbf{octa}));
  if (\neg mem\_hash[0].chunk) panic(errprint0("Can't||allocate||chunk||0"));
  mem\_hash[hash\_prime].chunk = (\mathbf{octa} *) \ calloc(1 \ll 13, \mathbf{sizeof}(\mathbf{octa}));
  if (\neg mem\_hash[hash\_prime].chunk) panic(errprint\theta("Can't_allocate_uO_uchunk"));
  mem\_chunks = 1;
  fetch\_bot = (fetch *) calloc(fetch\_buf\_size + 1, sizeof(fetch));
  if (\neg fetch\_bot) panic (errprint\theta("Can't_{\sqcup}allocate_{\sqcup}the_{\sqcup}fetch_{\sqcup}buffer"));
  fetch\_top = fetch\_bot + fetch\_buf\_size;
  reorder\_bot = (\mathbf{control} *) \ calloc(reorder\_buf\_size + 1, \mathbf{sizeof}(\mathbf{control}));
  if (¬reorder_bot) panic(errprint0("Can',t⊔allocate⊔,the⊔reorder⊔,buffer"));
  reorder\_top = reorder\_bot + reorder\_buf\_size;
  wbuf\_bot = (write\_node *) calloc(write\_buf\_size + 1, sizeof(write\_node));
  if (¬wbuf_bot) panic(errprint0("Can't_allocate_the_write_buffer"));
  wbuf\_top = wbuf\_bot + write\_buf\_size;
  if (bp\_n \equiv 0) bp\_table = \Lambda;
            /* a branch prediction table is desired */
     if (bp\_a + bp\_b + bp\_c > 31) panic(errprint0) ("Configuration error)
              :_Branch_table_has_>=_2_gigabytes_of_data"));
     bp\_table = (\mathbf{char} *) \ calloc(1 \ll (bp\_a + bp\_b + bp\_c), \mathbf{sizeof(char)});
     if (¬bp_table) panic(errprint0("Can'tuallocateutheubranchutable"));
  l = (\mathbf{specnode} *) \ calloc(lring\_size, \mathbf{sizeof}(\mathbf{specnode}));
  if (\neg l) \ panic(errprint\theta("Can't_allocate_local_registers"));
  i = bus\_words;
  if (Icache \land (Icache \neg bb \gg 3) > j) j = Icache \neg bb \gg 3;
  fetched = (\mathbf{octa} *) \ calloc(j, \mathbf{sizeof}(\mathbf{octa}));
  if (-fetched) panic(errprint0("Can't_allocate_prefetch_buffer"));
   dispatch\_stat = (int *) calloc(dispatch\_max + 1, sizeof(int));
  if (-dispatch_stat) panic(errprint0("Can't_allocate_dispatch_counts"));
  no\_hardware\_PT = 1 - hardware\_PT;
```

This code is used in section 38.

**38.** Putting it all together. Here then is the desired configuration subroutine.

```
/* fopen, fgets, sscanf, rewind */
#include <stdio.h>
#include <stdlib.h>
                            /* calloc, exit */
#include <ctype.h>
                          /* isspace */
                            /* strcpy, strlen, strcmp */
#include <string.h>
#include <limits.h>
                            /* INT_MAX */
#include "mmix-pipe.h"
  (Type definitions 12)
  (Global variables 9)
  (Subroutines 10)
  void MMIX_config(filename)
      char *filename;
  { register int i, j, n;
    config_{\tt file} = fopen(filename, "r");
    if (¬config_file) panic(errprint1("Can't⊔open⊔configuration⊔file⊔%s", filename));
    (Initialize to defaults 17);
     Count and allocate the functional units 18);
     \langle \text{ Record all the specs 19} \rangle;
     (Allocate coroutines in each functional unit 26);
    (Allocate the caches 35);
    ⟨ Allocate the scheduling queue 36⟩;
    ⟨ Touch up last-minute trivia 37⟩;
```

```
bb: int, MMIX-PIPE §167.
bp_a: int, MMIX-PIPE §150.
bp_b: int, MMIX-PIPE §150.
bp\_c: int, MMIX-PIPE §150.
bp_n: int, MMIX-PIPE §150.
bp\_table: char *,
 MMIX-PIPE §150.
bus_words: int.
 MMIX-PIPE §214.
calloc: void *(), <stdlib.h>.
chunk: octa *, MMIX-PIPE §206.
chunknode = struct,
  MMIX-PIPE §206.
config_file: FILE *, \S9.
control = struct,
  MMIX-PIPE §44.
dispatch\_max: int,
  MMIX-PIPE §59.
dispatch\_stat: int *,
 MMIX-PIPE §66.
errprint0 = macro(), \S 8.
errprint1 = macro(), \S 8.
exit: void (), <stdlib.h>.
fetch = struct, MMIX-PIPE §68.
```

```
fetch_bot: fetch *,
  MMIX-PIPE §69.
fetch_buf_size: int, §15.
fetch_top: fetch *,
  MMIX-PIPE §69.
fetched: octa *,
  MMIX-PIPE \S 284.
fgets: \mathbf{char} *(), < \mathbf{stdio.h} >.
fopen: FILE *(), <stdio.h>.
hardware\_PT: int, \S 15.
hash_prime: int,
  MMIX-PIPE §207.
Icache: cache *,
  MMIX-PIPE §168.
INT_MAX = macro, <limits.h>.
isspace: int (), <ctype.h>.
l: specnode *, MMIX-PIPE §86.
lring_size: int, MMIX-PIPE §86.
mem_chunks: int,
  MMIX-PIPE §207.
mem_chunks_max: int,
  MMIX-PIPE §207.
mem_hash: chunknode *,
  MMIX-PIPE §207.
```

*no\_hardware\_PT*: **bool**, MMIX-PIPE §242. octa = struct, MMIX-PIPE §17.  $panic = macro(), \S 8.$ reorder\_bot: control \*, MMIX-PIPE §60. reorder\_buf\_size: int, §15. reorder\_top: control \*, MMIX-PIPE §60. rewind: void (), <stdio.h>. specnode = struct,MMIX-PIPE §40. sscanf: int (), <stdio.h>. strcmp: int (), <string.h>. strcpy: char \*(), <string.h>. strlen: size\_t (), <string.h>. wbuf\_bot: write\_node \*, MMIX-PIPE §247. wbuf\_top: write\_node \*, MMIX-PIPE §247. write\_buf\_size: int, §15.  $write\_node = struct,$ MMIX-PIPE §246.

Type definitions 12, 13, 14 Used in section 38.

## 39. Names of the sections.

```
(Allocate coroutines in each functional unit 26) Used in section 38.
\langle Allocate reader coroutines for cache c 34\rangle Used in section 31.
\langle Allocate the cache sets for cache c 32 \rangle Used in section 31.
(Allocate the caches 35) Used in section 38.
(Allocate the scheduling queue 36) Used in section 38.
\langle Allocate the victim cache for cache c 33 \rangle Used in section 31.
Build table of pipeline stages needed for each opcode 27 \ Used in section 26.
 Count and allocate the functional units 18 \ Used in section 38.
Determine the number of stages, n, needed by funit[j] 29 \times Used in section 26.
(Global variables 9, 15, 28) Used in section 38.
(If token is a cache name, process a cache spec 21) Used in section 19.
(If token is a parameter name, process a PV spec 20) Used in section 19.
\langle \text{If } token \text{ is an operation name, process a pipe spec 24} \rangle Used in section 19.
(Initialize to defaults 17) Used in section 38.
(Process a functional spec 25) Used in section 19.
(Record all the specs 19) Used in section 38.
(Subroutines 10, 11, 16, 22, 23, 30, 31) Used in section 38.
(Touch up last-minute trivia 37) Used in section 38.
```

1. Introduction. This program module contains brute-force implementations of the ten input/output primitives defined at the beginning of MMIX-SIM. The subroutines are grouped here as a separate package, because they are intended to be loaded with the pipeline simulator as well as with the simple simulator.

```
⟨Preprocessor macros 2⟩
⟨Type definitions 3⟩
⟨External subroutines 4⟩
⟨Global variables 6⟩
⟨Subroutines 7⟩
```

2. Of course we include standard C library routines, and we set things up to accommodate older versions of C.

```
\langle \text{ Preprocessor macros } 2 \rangle \equiv
#include <stdio.h>
#include <stdlib.h>
#ifdef __STDC__
\#define ARGS(list) list
#else
\#define ARGS(list) ()
#endif
#ifndef FILENAME_MAX
#define FILENAME_MAX 256
#endif
#ifndef SEEK_SET
#define SEEK_SET 0
#endif
#ifndef SEEK_END
#define SEEK_END 2
#endif
```

This code is used in section 1.

3. The unsigned 32-bit type tetra must agree with its definition in the simulators.

```
⟨ Type definitions 3⟩ ≡
  typedef unsigned int tetra;
  typedef struct {
    tetra h, l;
  } octa; /* two tetrabytes make one octabyte */
See also section 5.
```

This code is used in section 1.

4. Three basic subroutines are used to get strings from the simulated memory and to put strings into that memory. These subroutines are defined appropriately in each simulator. We also use a few subroutines and constants defined in MMIX-ARITH.

```
⟨External subroutines 4⟩ ≡
extern char stdin_chr ARGS((void));
extern int mmgetchars ARGS((char *buf, int size, octa addr, int stop));
```

```
extern void mmputchars ARGS((unsigned char *buf, int size, octa addr));
  extern octa oplus ARGS((octa, octa));
  extern octa ominus ARGS((octa, octa));
  extern octa incr ARGS((octa, int));
                                /* zero\_octa.h = zero\_octa.l = 0 */
  extern octa zero_octa;
                               /* neg\_one.h = neg\_one.l = -1 */
  extern octa neq_one;
This code is used in section 1.
     Each possible handle has a file pointer and a current mode.
\langle \text{Type definitions } 3 \rangle + \equiv
  typedef struct {
     FILE *fp;
                     /* file pointer */
     int mode;
                     /* [read OK] + 2[write OK] + 4[binary] + 8[readwrite] */
  } sim_file_info;
6. \langle Global variables _{6}\rangle \equiv
  sim_file_info sfile [256];
See also sections 9 and 24.
This code is used in section 1.
     The first three handles are initially open.
\langle \text{Subroutines } 7 \rangle \equiv
  void mmix_io_init ARGS((void));
  void mmix_io_init()
     sfile[0].fp = stdin, sfile[0].mode = 1;
     sfile[1].fp = stdout, sfile[1].mode = 2;
     sfile[2].fp = stderr, sfile[2].mode = 2;
  }
```

See also sections 8, 10, 11, 12, 14, 16, 18, 19, 20, 21, 22, and 23.

This code is used in section 1.

```
_STDC__, Standard C.
                                                               stderr: FILE *, <stdio.h>.
                                 MMIX-SIM §117.
FILE, <stdio.h>.
                               mmputchars: void (),
                                                               stdin: FILE *, <stdio.h>.
FILENAME_MAX = macro,
                                 MMIX-PIPE §384.
                                                               stdin_chr: char (),
                               neg_one: octa, MMIX-ARITH §4.
  <stdio.h>.
                                                                MMIX-SIM \S 120.
incr: octa (), MMIX-ARITH §6.
                               ominus: octa (),
                                                              stdin_chr: char (),
mmgetchars: int (),
                                MMIX-ARITH §5.
                                                                MMIX-PIPE §387.
  MMIX-SIM §114.
                               oplus: octa (), MMIX-ARITH §5. stdout: FILE *, <stdio.h>.
mmgetchars: int (),
                               SEEK_END = macro, <stdio.h>. zero_octa: octa,
 MMIX-PIPE §381.
                               SEEK_SET = macro, <stdio.h>.
                                                               MMIX-ARITH §4.
mmputchars: void (),
```

**8.** The only tricky thing about these routines is that we want to protect the standard input, output, and error streams from being preempted.

```
\langle \text{Subroutines } 7 \rangle + \equiv
  octa mmix_fopen ARGS((unsigned char, octa, octa));
  octa mmix_fopen(handle, name, mode)
       unsigned char handle;
       octa name, mode:
  {
     char name_buf[FILENAME_MAX];
     if (mode.h \lor mode.l > 4) goto abort;
     if (mmqetchars(name\_buf, FILENAME\_MAX, name, 0) \equiv FILENAME\_MAX) goto abort;
     if (sfile[handle].mode \neq 0 \land handle > 2) fclose(sfile[handle].fp);
     sfile[handle].fp = fopen(name\_buf, mode\_string[mode.l]);
     if (\neg sfile[handle].fp) goto abort;
     sfile[handle].mode = mode\_code[mode.l];
     return zero_octa;
                            /* success */
  abort: sfile[handle].mode = 0;
     return neq_one;
                           /* failure */
  }
9. \langle Global variables 6 \rangle + \equiv
  char * mode\_string[] = {"r", "w", "rb", "wb", "w+b"};
  int mode\_code[] = \{ \text{#1}, \text{#2}, \text{#5}, \text{#6}, \text{#f} \};
10. If the simulator is being used interactively, we can avoid competition for stdin
by substituting another file.
\langle \text{Subroutines } 7 \rangle + \equiv
  void mmix_fake_stdin ARGS((FILE *));
  void mmix\_fake\_stdin(f)
       FILE *f;
     sfile[0].fp = f; /* f should be open in mode "r" */
     \langle \text{Subroutines } 7 \rangle + \equiv
  octa mmix_fclose ARGS((unsigned char));
  octa mmix_fclose(handle)
       unsigned char handle;
     if (sfile[handle].mode \equiv 0) return neg\_one;
     if (handle > 2 \land fclose(sfile[handle].fp) \neq 0) return neg\_one;
     sfile[handle].mode = 0;
                             /* success */
     return zero_octa;
```

```
12.
       \langle \text{Subroutines } 7 \rangle + \equiv
  octa mmix_fread ARGS((unsigned char, octa, octa));
  octa mmix_fread(handle, buffer, size)
        unsigned char handle;
        octa buffer, size;
     register unsigned char *buf;
     register int n:
     octa o;
     o = neq\_one;
     if (¬(sfile[handle].mode & #1)) goto done;
     if (sfile[handle].mode \& #8) sfile[handle].mode \& = \sim #2;
     if (size.h) goto done;
     buf = (unsigned char *) calloc(size.l, sizeof(char));
     if (\neg buf) goto done;
     \langle \text{Read } n \leq size.l \text{ characters into } buf 13 \rangle;
     mmputchars(buf, n, buffer);
     free(buf);
     o.h = 0, o.l = n:
  done: return ominus(o, size);
13.
       \langle \text{Read } n \leq \text{size.l characters into } buf | 13 \rangle \equiv
  if (sfile[handle].fp \equiv stdin) {
     register unsigned char *p:
     for (p = buf, n = size.l; p < buf + n; p++) *p = stdin\_chr();
  else {
     clearerr(sfile[handle].fp);
     n = fread(buf, 1, size.l, sfile[handle].fp);
     if (ferror(sfile[handle].fp)) {
        free(buf);
        goto done;
     }
This code is used in section 12.
```

```
ARGS = macro(), \S 2.
                                h: tetra, §3.
                                                                octa = struct, \S 3.
calloc: void *(), <stdlib.h>.
                                l: tetra, §3.
                                                                 ominus: octa (),
                                                                  MMIX-ARITH §5.
clearerr: void (), <stdio.h>.
                                mmgetchars: int (),
fclose: int (), <stdio.h>.
                                 MMIX-SIM §114.
                                                                sfile: sim_file_info [], §6.
ferror: int (), <stdio.h>.
                                mmgetchars: int (),
                                                                 stdin: FILE *, <stdio.h>.
FILE, <stdio.h>.
                                 MMIX-PIPE §381.
                                                                stdin_chr: char (),
FILENAME_MAX = macro,
                                mmputchars: void (),
                                                                  MMIX-SIM §120.
  <stdio.h>.
                                 MMIX-SIM §117.
                                                                stdin_chr: char (),
fopen: FILE *(), <stdio.h>.
                               mmputchars: void (),
                                                                  MMIX-PIPE §387.
fp: FILE *, §5.
                                 MMIX-PIPE §384.
                                                                zero_octa: octa,
fread: size_t (), <stdio.h>.
                                mode: int, \S 5.
                                                                  MMIX-ARITH §4.
free: void (), <stdlib.h>.
                                neq_one: octa, MMIX-ARITH §4.
```

```
14. \langle Subroutines 7 \rangle + \equiv
  octa mmix_fgets ARGS((unsigned char,octa,octa));
  octa mmix_fqets(handle, buffer, size)
        unsigned char handle;
        octa buffer, size;
     char buf [256];
     register int n, s;
     register char *p;
     octa o:
     int eof = 0;
     if (¬(sfile[handle].mode & #1)) return neg_one;
     if (\neg size.l \land \neg size.h) return neg\_one;
     if (sfile[handle].mode \& #8) sfile[handle].mode \& = \sim #2;
     size = incr(size, -1);
     o = zero\_octa;
     while (1) {
        \langle \text{Read } n < 256 \text{ characters into } buf 15 \rangle;
        mmputchars((unsigned char *) buf, n + 1, buffer);
        o = incr(o, n);
        size = incr(size, -n);
        if ((n \land buf[n-1] \equiv `\n') \lor (\neg size.l \land \neg size.h) \lor eof) return o;
        buffer = incr(buffer, n);
     }
  }
15. \langle \text{Read } n < 256 \text{ characters into } buf | 15 \rangle \equiv
  s = 255;
  if (size.l < s \land \neg size.h) s = size.l;
  if (sfile[handle].fp \equiv stdin)
     for (p = buf, n = 0; n < s;)
        *p = stdin\_chr();
        n++;
        if (*p++ \equiv '\n') break;
     }
  else {
     if (\neg fgets(buf, s+1, sfile[handle].fp)) return neg\_one;
     eof = feof (sfile [handle].fp);
     for (p = buf, n = 0; n < s;)
        if (\neg *p \land eof) break;
        n++;
        if (*p++ \equiv '\n') break;
     }
  *p = '\0';
This code is used in section 14.
```

16. The routines that deal with wyde characters might need to be changed on a system that is little-endian; the author wishes good luck to whoever has to do this. MMIX is always big-endian, but external files prepared on random operating systems might be backwards.

```
\langle \text{Subroutines } 7 \rangle + \equiv
  octa mmix_fqetws ARGS((unsigned char, octa, octa));
  octa mmix_fqetws(handle, buffer, size)
        unsigned char handle;
        octa buffer, size;
     char buf [256];
     register int n, s;
     register char *p:
     octa o;
     int eof = 0:
     if (¬(sfile[handle].mode & #1)) return neq_one;
     if (\neg size.l \land \neg size.h) return neq\_one;
     if (sfile[handle].mode \& #8) sfile[handle].mode \& = \sim #2;
     buffer. l \&= -2;
     size = incr(size, -1);
     o = zero\_octa;
     while (1) {
        \langle \text{Read } n < 128 \text{ wyde characters into } buf 17 \rangle;
        mmputchars((unsigned char *) buf, 2 * n + 2, buffer);
        o = incr(o, n);
        size = incr(size, -n);
        if ((n \land buf[2*n-1] \equiv \land \texttt{n}, \land buf[2*n-2] \equiv 0) \lor (\neg size.l \land \neg size.h) \lor eof)
           return o:
        buffer = incr(buffer, 2 * n);
     }
  }
```

```
ARGS = macro(), \S 2.
                                                                  stdin: FILE *, <stdio.h>.
                                   MMIX-SIM §117.
                                 mmputchars: void (),
                                                                  stdin_chr: char (),
feof: int(), <stdio.h>.
fgets: char *(), <stdio.h>.
                                   MMIX-PIPE \S 384.
                                                                    MMIX-SIM §120.
fp: FILE *, §5.
                                 mode: int, \S 5.
                                                                  stdin_chr: char (),
                                 neg\_one: octa, MMIX-ARITH §4.
h: tetra, §3.
                                                                   MMIX-PIPE §387.
incr: octa (), MMIX-ARITH §6.
                                 octa = struct, \S 3.
                                                                 zero_octa: octa,
l: tetra, §3.
                                 sfile: sim_file_info [], §6.
                                                                   MMIX-ARITH §4.
mmputchars: void (),
```

```
17.
      \langle \text{Read } n < 128 \text{ wyde characters into } buf | 17 \rangle \equiv
  s = 127;
  if (size.l < s \land \neg size.h) s = size.l;
  if (sfile[handle].fp \equiv stdin)
     for (p = buf, n = 0; n < s;)
        *p++ = stdin\_chr(); *p++ = stdin\_chr();
        if (*(p-1) \equiv ' \ ' \land *(p-2) \equiv 0) break;
     }
  else
     for (p = buf, n = 0; n < s;)
        if (fread(p, 1, 2, sfile[handle].fp) \neq 2) {
           eof = feof (sfile [handle].fp);
          if (\neg eof) return neg\_one;
          break:
        }
        n++, p+=2;
        if (*(p-1) \equiv '\n' \land *(p-2) \equiv 0) break;
  *p = *(p+1) = '\0':
This code is used in section 16.
18. \langle \text{Subroutines } 7 \rangle + \equiv
  octa mmix_fwrite ARGS((unsigned char, octa, octa));
  octa mmix_fwrite(handle, buffer, size)
        unsigned char handle:
        octa buffer, size;
     char buf [256];
     register int n;
     if (¬(sfile[handle].mode & #2)) return ominus(zero_octa, size);
     if (sfile[handle].mode \& #8) sfile[handle].mode \& = \sim #1;
     while (1) {
        if (size.h \lor size.l > 256) n = mmgetchars(buf, 256, buffer, -1);
        else n = mmgetchars(buf, size.l, buffer, -1);
        size = incr(size, -n);
        if (fwrite(buf, 1, n, sfile[handle].fp) \neq n) return ominus(zero\_octa, size);
        fflush(sfile[handle].fp);
        if (\neg size.l \land \neg size.h) return zero\_octa;
        buffer = incr(buffer, n);
     }
  }
19.
      \langle \text{Subroutines } 7 \rangle + \equiv
  octa mmix_fputs ARGS((unsigned char, octa));
  octa mmix_fputs(handle, string)
        unsigned char handle;
        octa string;
  {
```

```
char buf [256];
     register int n;
     octa o:
     o = zero\_octa;
     if (¬(sfile[handle].mode & #2)) return neq_one;
     if (sfile[handle].mode \& #8) sfile[handle].mode \& = \sim #1;
     while (1) {
       n = mmgetchars(buf, 256, string, 0);
       if (fwrite(buf, 1, n, sfile[handle].fp) \neq n) return neg\_one;
       o = incr(o, n);
       if (n < 256) {
          fflush(sfile[handle].fp);
          return o;
       string = incr(string, n);
20.
    \langle \text{Subroutines } 7 \rangle + \equiv
  octa mmix_fputws ARGS((unsigned char, octa));
  octa mmix_fputws(handle, string)
       unsigned char handle;
       octa string;
     char buf [256]:
     register int n:
     octa o:
     o = zero\_octa;
     if (¬(sfile[handle].mode & #2)) return neg_one;
     if (sfile[handle].mode \& #8) sfile[handle].mode \& = \sim #1;
     while (1) {
       n = mmgetchars(buf, 256, string, 1);
       if (fwrite(buf, 1, n, sfile[handle].fp) \neq n) return neg\_one;
       o = incr(o, n \gg 1);
       if (n < 256) {
          fflush(sfile[handle].fp);
          return o;
       string = incr(string, n);
     }
                                 l: tetra, §3.
                                                                   p: register char *, §16.
ARGS = macro(), \S 2.
buf: char [], §16.
                                 mmqetchars: int (),
                                                                   s: register int, §16.
                                  MMIX-SIM §114.
                                                                   sfile: sim_file_info [], §6.
eof: \mathbf{int}, \S 16.
feof: int (), <stdio.h>.
                                 mmgetchars: int (),
                                                                   size: octa, §16.
                                                                   stdin: FILE *, <stdio.h>.
fflush: int (), <stdio.h>.
                                 мміх-ріре §381.
fp: \mathbf{FILE} *, \S 5.
                                 mode: int, \S 5.
                                                                   stdin\_chr: \mathbf{char} (),
fread: size_t (), <stdio.h>.
                               n: register int, §16.
                                                                    MMIX-SIM §120.
fwrite: size_t (), <stdio.h>.
                                 neq_one: octa, MMIX-ARITH §4. stdin_chr: char (),
h: tetra, §3.
                                 octa = struct, \S 3.
                                                                    MMIX-PIPE §387.
handle: unsigned char, §16.
                                                                   zero_octa: octa,
                                 ominus: octa (),
                                                                     MMIX-ARITH §4.
incr: octa (), MMIX-ARITH §6.
                                  MMIX-ARITH §5.
```

```
21.
      #define sign_bit ((unsigned) #80000000)
\langle \text{Subroutines } 7 \rangle + \equiv
  octa mmix_fseek ARGS((unsigned char, octa));
  octa mmix_fseek(handle, offset)
       unsigned char handle;
      octa offset;
    if (\neg(sfile[handle].mode \& #4)) return neg\_one;
    if (sfile[handle].mode \& #8) sfile[handle].mode = #f;
    if (offset.h & sign_bit) {
      if (fseek(sfile[handle], fp, (int) offset.l + 1, SEEK_END) \neq 0) return neq_one;
       if (offset.h \lor (offset.l \& sign\_bit)) return neg\_one;
       if (fseek(sfile[handle].fp, (int) offset.l, SEEK_SET) \neq 0) return neq\_one;
    return zero_octa;
22. \langle Subroutines 7\rangle + \equiv
  octa mmix_ftell ARGS((unsigned char));
  octa mmix_ftell(handle)
       unsigned char handle;
    register long x:
    octa o:
    if (¬(sfile[handle].mode & #4)) return neq_one;
    x = ftell(sfile[handle].fp);
    if (x < 0) return neg\_one;
    o.h = 0, o.l = x;
    return o;
  }
     One last subroutine belongs here, just in case the user has modified the standard
error handle.
\langle \text{Subroutines } 7 \rangle + \equiv
  void print_trip_warning ARGS((int, octa));
  void print\_trip\_warning(n, loc)
      int n:
      octa loc;
    trip\_warning[n], loc.h, loc.l);
  }
    \langle \text{Global variables } 6 \rangle + \equiv
  char *trip_warning[] = {"TRIP", "integer_divide_check", "integer_overflow",
       "float-to-fix_overflow", "invalid_floating_point_operation",
       "floating_point_overflow", "floating_point_underflow",
       "floating_point_division_by_zero", "floating_point_inexact"};
```

## 25. Names of the sections.

```
      ARGS = macro (), §2.
      h: tetra, §3.
      SEEK_END = macro, <stdio.h>.

      fp: FILE *, §5.
      l: tetra, §3.
      SEEK_SET = macro, <stdio.h>.

      fprintf: int (), <stdio.h>.
      mode: int, §5.
      sfile: sim_file_info [], §6.

      fseek: int (), <stdio.h>.
      neg_one: octa, MMIX-ARITH §4.
      zero_octa: octa, MMIX-ARITH §4.

      ftell: long (), <stdio.h>.
      octa = struct, §3.
      MMIX-ARITH §4.
```

1. Memory-mapped input and output. This module supplies procedures for reading from and writing to MMIX memory addresses that exceed 48 bits. Such addresses are used by the operating system for input and output, so they require special treatment. At present only dummy versions of these routines are implemented. Users who need nontrivial versions of spec\_read and/or spec\_write should prepare their own and link them with the rest of the simulator.

Many I/O devices communicate via bytes or wydes or tetras instead of octabytes. So these prototype routines have a *size* parameter, to distinguish between the various kinds of quantities that MMIX wants to read from and write to the memory-mapped addresses.

```
#include <stdio.h>
#include "mmix-pipe.h" /* header file for all modules */
extern octa read_hex(); /* found in the main program module */
static char buf [20];
static char *kind[] = {"byte", "wyde", "tetra", "octa"};
extern octa shift_left ARGS((octa y, int s)); /* y \ll s, 0 \le s \le 64 */
extern octa shift_right ARGS((octa y, int s, int u)); /* y \gg s, signed if \neg u */
```

2. If the *interactive\_read\_bit* of the *verbose* control is set, the user is supposed to supply values dynamically. Otherwise zero is read.

```
octa spec_read ARGS((octa, int));
octa \ spec\_read(addr, size)
     octa addr:
     int size:
  octa val;
  size \&= \#3, addr.l \&= -(1 \ll size);
  if (verbose & interactive_read_bit) {
     printf("**\_Read_\_\%s\_from_\_loc_\_\%08x\%08x:_\_", kind[size], addr.h, addr.l);
     fqets(buf, 20, stdin);
     val = read\_hex(buf);
  else val.l = val.h = 0;
  switch (size) {
  case 0: val.l \&= {}^{\#}\mathbf{ff};
  case 1: val.l \&= #fffff:
  case 2: val.h = 0;
  case 3: break;
  if (verbose & show_spec_bit) {
     printf("\( \subset \) (spec_read\( \subset \));
     switch (size) {
     case 0: printf("%02x", val.l); break;
     case 1: printf("%04x", val.l); break;
     case 2: printf("%08x", val.l); break;
     case 3: printf("%08x%08x", val.h, val.l); break;
```

**3.** The default *spec\_write* just reports its arguments, without actually writing anything.

```
void spec_write ARGS((octa, octa, int));
void spec_write(addr, val, size)
     octa addr, val;
     int size;
  if (verbose & show_spec_bit) {
     size \&= #3, addr.l \&= -(1 \ll size);
     val = shift\_right(val, (8 - (1 \ll size) - (addr.l \& 7)) \ll 3, 1);
     printf("\( \subset \) (spec_write\( \subset \);
     switch (size) {
     case 0: printf("%02x", val.l); break;
     case 1: printf ("%04x", val.l); break;
     case 2: printf("%08x", val.l); break;
     case 3: printf("%08x%08x", val.h, val.l); break;
     printf("_{\sqcup}to_{\sqcup}\%08x\%08x_{\sqcup}at_{\sqcup}time_{\sqcup}\%d)\n", addr.h, addr.l, ticks.l);
  }
}
```

**4.** Incidentally, the combined address a and size s could be transmitted in 64 bits of an actual memory bus, because a is always a multiple of  $2^s$  that is less than  $2^{63}$ . Thus (a, s) can be packed neatly into the 64-bit number  $2a + 2^s$ . (Think about it.)

```
\begin{split} & \text{ARGS} = \text{macro, MMIX-PIPE } \S 6. \\ & \textit{fgets: char *(), <stdio.h>}. \\ & \textit{h: tetra, MMIX-PIPE } \S 17. \\ & \textit{interactive\_read\_bit} = 1 \ll 5, \\ & \text{MMIX-PIPE } \S 8. \\ & \textit{l: tetra, MMIX-PIPE } \S 17. \\ & \text{octa} = \text{struct, MMIX-PIPE } \S 17. \end{split}
```

```
printf: int (), <stdio.h>.
read_hex: octa (), MMMIX §17.
shift_left: octa (),
    MMIX-ARITH §7.
shift_right: octa (),
    MMIX-ARITH §7.
```

 $show\_spec\_bit = 1 \ll 6,$   $MMIX-PIPE \S 8.$  stdin: FILE \*, <stdio.h>. ticks: Extern octa,  $MMIX-PIPE \S 87.$   $verbose: int, MMIX-PIPE \S 4.$ 

1. Introduction. This program is the heart of the meta-simulator for the ultra-configurable MMIX pipeline: It defines the *MMIX\_run* routine, which does most of the work. Another routine, *MMIX\_init*, is also defined here, and so is a header file called mmix\_pipe.h. The header file is used by the main routine and by other routines like *MMIX\_config*, which are compiled separately.

Readers of this program should be familiar with the explanation of MMIX architecture as presented in the main program module for MMMIX.

A lot of subtle things can happen when instructions are executed in parallel. Therefore this simulator ranks among the most interesting and instructive programs in the author's experience. The author has tried his best to make everything correct ... but the chances for error are great. Anyone who discovers a bug is therefore urged to report it as soon as possible; please see http://mmix.cs.hm.edu/bugs/ for instructions.

It sort of boggles the mind when one realizes that the present program might someday be translated by a C compiler for MMIX and used to simulate *itself*.

2. This high-performance prototype of MMIX achieves its efficiency by means of "pipelining," a technique of overlapping that is explained for the related DLX computer in Chapter 3 of Hennessy & Patterson's book *Computer Architecture* (second edition). Other techniques such as "dynamic scheduling" and "multiple issue," explained in Chapter 4 of that book, are used too.

One good way to visualize the procedure is to imagine that somebody has organized a high-tech car repair shop according to similar principles. There are eight independent functional units, which we can think of as eight groups of auto mechanics, each specializing in a particular task; each group has its own workspace with room to deal with one car at a time. Group F (the "fetch" group) is in charge of rounding up customers and getting them to enter the assembly-line garage in an orderly fashion. Group D (the "decode and dispatch" group) does the initial vehicle inspection and writes up an order that explains what kind of servicing is required. The vehicles go next to one of the four "execution" groups: Group X handles routine maintenance, while groups XF, XM, and XD are specialists in more complex tasks that tend to take longer. (The XF people are good at floating the points, while the XM and XD groups are experts in multilink suspensions and differentials.) When the relevant X group has finished its work, cars drive to M station, where they send or receive messages and possibly pay money to members of the "memory" group. Finally all necessary parts are installed by members of group W, the "write" group, and the car leaves the shop. Everything is tightly organized so that in most cases the cars move in synchronized fashion from station to station, at regular 100-nanocentury intervals.

In a similar way, most MMIX instructions can be handled in a five-stage pipeline, F-D-X-M-W, with X replaced by XF for floating-point addition or conversion, or by XM for multiplication, or by XD for division or square root. Each stage ideally takes one clock cycle, although XF, XM, and (especially) XD are slower. If the instructions enter in a suitable pattern, we might see one instruction being fetched, another being decoded, and up to four being executed, while another is accessing memory, and yet

another is finishing up by writing new information into registers; all this is going on simultaneously during one clock cycle. Pipelining with eight separate stages might therefore make the machine run up to 8 times as fast as it could if each instruction were being dealt with individually and without overlap. (Well, perfect speedup turns out to be impossible, because of the shared M and W stages; the theory of knapsack programming, to be discussed in Section 7.7 of The Art of Computer Programming, tells us that the maximal achievable speedup is at most 8-1/p-1/q-1/r when XF, XM, and XD have delays bounded by p, q, and r cycles. But we can achieve a factor of more than 7 if we are very lucky.)

Consider, for example, the ADD instruction. This instruction enters the computer's processing unit in F stage, taking only one clock cycle if it is in the cache of instructions recently seen. Then the D stage recognizes the command as an ADD and acquires the current values of \$Y and \$Z; meanwhile, of course, another instruction is being fetched by F. On the next clock cycle, the X stage adds the values together. This prepares the way for the M stage to watch for overflow and to get ready for any exceptional action that might be needed with respect to the settings of special register rA. Finally, on the fifth clock cycle, the sum is either written into X or the trip handler for integer overflow is invoked. Although this process has taken five clock cycles (that is, V), the net increase in running time has been only V.

Of course congestion can occur, inside a computer as in a repair shop. For example, auto parts might not be readily available; or a car might have to sit in D station while waiting to move to XM, thereby blocking somebody else from moving from F to D. Sometimes there won't necessarily be a steady stream of customers. In such cases the employees in some parts of the shop will occasionally be idle. But we assume that they always do their jobs as fast as possible, given the sequence of customers that they encounter. With a clever person setting up appointments—translation: with a clever programmer and/or compiler arranging MMIX instructions—the organization can often be expected to run at nearly peak capacity.

In fact, this program is designed for experiments with many kinds of pipelines, potentially using additional functional units (such as several independent X groups), and potentially fetching, dispatching, and executing several nonconflicting instructions simultaneously. Such complications make this program more difficult than a simple pipeline simulator would be, but they also make it a lot more instructive because we can get a better understanding of the issues involved if we are required to treat them in greater generality.

**3.** Here's the overall structure of the present program module.

```
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "abstime.h"

⟨Preprocessor definitions 6⟩
⟨Type definitions 11⟩
⟨Global variables 20⟩
⟨External variables 4⟩
⟨Internal prototypes 13⟩
⟨External prototypes 9⟩
⟨Subroutines 14⟩
⟨External routines 10⟩
```

4. The identifier Extern is used in MMIX-PIPE to declare variables that are accessed in other modules. Actually all appearances of 'Extern' are defined to be blank here, but 'Extern' will become 'extern' in the header file.

```
#define Extern /* blank for us, extern for them */
format Extern extern

\langle External variables 4\rangle \equiv
Extern int verbose; /* controls the level of diagnostic output */

See also sections 29, 59, 60, 66, 69, 77, 86, 87, 98, 115, 136, 150, 168, 207, 211, 214, 242, 247, 284, and 349.
```

This code is used in sections 3 and 5.

5. The header file repeats the basic definitions and declarations.

```
⟨mmix-pipe.h 5⟩ ≡
#define Extern extern
⟨Header definitions 6⟩
⟨Type definitions 11⟩
⟨External variables 4⟩
⟨External prototypes 9⟩
```

**6.** Subroutines of this program are declared first with a prototype, as in ANSI C, then with an old-style C function definition. The following preprocessor commands make this work correctly with both new-style and old-style compilers.

```
\langle Header definitions 6\rangle \equiv #ifdef __STDC__ #define ARGS(list) list #else #define ARGS(list) () #endif See also sections 7, 8, 52, 57, 129, and 166. This code is used in sections 3 and 5.
```

 $MMIX_run:$  void (), §10.

octa = struct, §17.

7. Some of the names that are natural for this program are in conflict with library names on at least one of the host computers in the author's tests. So we bypass the library names here.

```
⟨ Header definitions 6 ⟩ +≡
#define random my_random
#define fsqrt my_fsqrt
#define div my_div
8. The amount of verbosity
```

8. The amount of verbosity depends on the following bit codes.

```
\langle Header definitions 6 \rangle + \equiv
#define issue\_bit (1 \ll 0)
    /* show control blocks when issued, deissued, committed */
                              /* show the pipeline and locks on every cycle */
#define pipe\_bit (1 \ll 1)
#define coroutine_bit (1 \ll 2)
                                   /* show the coroutines when started on every cycle */
#define schedule\_bit (1 \ll 3)
                                    /* show the coroutines when scheduled */
#define uninit_mem_bit (1 \ll 4)
    /* complain when reading from an uninitialized chunk of memory */
#define interactive_read_bit (1 \ll 5)
    /* prompt user when reading from I/O location */
#define show\_spec\_bit (1 \ll 6)
    /* display special read/write transactions as they happen */
#define show\_pred\_bit (1 \ll 7)
                                     /* display branch prediction details */
#define show_wholecache_bit (1 \ll 8)
    /* display cache blocks even when their key tag is invalid */
```

**9.** The *MMIX\_init()* routine should be called exactly once, after *MMIX\_config()* has done its work but before the simulator starts to execute any programs. Then *MMIX\_run()* can be called as often as the user likes.

```
⟨ External prototypes 9⟩ ≡
Extern void MMIX_init ARGS((void));
Extern void MMIX_run ARGS((int cycs, octa breakpoint));
See also sections 38, 161, 175, 178, 180, 209, 212, and 252.
This code is used in sections 3 and 5.
```

```
\langle \text{External routines } 10 \rangle \equiv
  void MMIX_init()
  {
     register int i, j;
     ⟨Initialize everything 22⟩;
  void MMIX_run(cycs, breakpoint)
        int cycs;
        octa breakpoint;
     ⟨Local variables 12⟩;
     while (cycs) {
        if (verbose & (issue_bit | pipe_bit | coroutine_bit | schedule_bit))
           printf("***_{\!\square}Cycle_{\!\square}%d\n", ticks.l);
        ⟨ Perform one machine cycle 64⟩;
        if (verbose & pipe_bit) {
           print_pipe(); print_locks();
        if (breakpoint\_hit \lor halted) {
           if (breakpoint_hit)
              printf("Breakpoint_linstruction_lfetched_lat_ltime_l%d\n", ticks.l-1);
           if (halted) printf("Halted, at, time, %d\n", ticks.l-1);
           break:
        }
        cycs --;
  cease:;
  }
See also sections 39, 162, 176, 179, 181, 210, 213, and 253.
This code is used in section 3.
     \langle \text{Type definitions } 11 \rangle \equiv
  typedef enum {
     false, true, wow
  } bool;
                 /* slightly extended booleans */
See also sections 17, 23, 37, 40, 44, 68, 76, 164, 167, 206, 246, and 371.
This code is used in sections 3 and 5.
      \langle \text{Local variables } 12 \rangle \equiv
  register int i, j, m;
  bool breakpoint\_hit = false;
  bool halted = false;
See also sections 124 and 258.
This code is used in section 10.
```

13. Error messages that abort this program are called panic messages. The macro called *confusion* will never be needed unless this program is internally inconsistent.

```
\#define errprint0(f) fprintf(stderr, f)
#define errprint1(f, a) fprintf(stderr, f, a)
#define errprint2(f, a, b) fprintf(stderr, f, a, b)
\#define panic(x) { errprint0("Panic:||"); x; errprint0("!\n"); expire(); }
#define confusion(m) errprint1("This_can't_happen:__%s", m)
\langle Internal prototypes 13 \rangle \equiv
  static void expire ARGS((void));
See also sections 18, 24, 27, 30, 32, 34, 42, 45, 55, 62, 72, 90, 92, 94, 96, 156, 158, 169, 171, 173, 182,
     184, 186, 188, 190, 192, 195, 198, 200, 202, 204, 240, 250, 254, and 377.
This code is used in section 3.
14. \langle Subroutines 14\rangle \equiv
  static void expire()
                              /* the last gasp before dying */
     if (ticks.h) errprint2("(Clock_time_tis_t%dH+%d.)\n", ticks.h, ticks.l);
     else errprint1("(Clock_\time_\is_\%d.)\n", ticks.l);
     exit(-2);
See also sections 19, 21, 25, 28, 31, 33, 35, 43, 46, 56, 63, 73, 91, 93, 95, 97, 157, 159, 170, 172, 174,
     183, 185, 187, 189, 191, 193, 196, 199, 201, 203, 205, 208, 241, 251, 255, 378, 379, 381, 384,
     and 387.
```

This code is used in section 3.

15. The data structures of this program are not precisely equivalent to logical gates that could be implemented directly in silicon; we will use data structures and algorithms appropriate to the C programming language. For example, we'll use pointers and arrays, instead of buses and ports and latches. However, the net effect of our data structures and algorithms is intended to be equivalent to the net effect of a silicon implementation. The methods used below are essentially equivalent to those used in real machines today, except that diagnostic facilities are added so that we can readily watch what is happening.

Each functional unit in the MMIX pipeline is programmed here as a coroutine in C. At every clock cycle, we will call on each active coroutine to do one phase of its operation; in terms of the repair-station analogy described in the main program, this corresponds to getting each group of auto mechanics to do one unit of operation on a car. The coroutines are performed sequentially, although a real pipeline would have them act in parallel. We will not "cheat" by letting one coroutine access a value early in its cycle that another one computes late in its cycle, unless computer hardware could "cheat" in an equivalent way.

```
ARGS = macro, \S 6.
                                    l: tetra, §17.
                                                                         printf: int (), <stdio.h>.
coroutine\_bit = 1 \ll 2, \S 8.
                                    octa = struct, \S17.
                                                                         schedule\_bit = 1 \ll 3, \S 8.
                                                                         stderr: FILE *, <stdio.h>.
exit: void (), <stdlib.h>.
                                    pipe\_bit = 1 \ll 1, \S 8.
fprintf: int (), <stdio.h>.
                                    print_locks: void (), §39.
                                                                         ticks: Extern octa, §87.
h: tetra, §17.
                                    print_pipe: void (), §253.
                                                                         verbose: int, §4.
issue\_bit = 1 \ll 0, \S 8.
```

- 16. Low-level routines. Where should we begin? It is tempting to start with a global view of the simulator and then to break it down into component parts. But that task is too daunting, because there are so many unknowns about what basic ingredients ought to be combined when we construct the larger components. So let us look first at the primitive operations on which the superstructure will be built. Once we have created some infrastructure, we'll be able to proceed with confidence to the larger tasks ahead.
- 17. This program for the 64-bit MMIX architecture is based on 32-bit integer arithmetic, because nearly every computer available to the author at the time of writing (1998–1999) was limited in that way. Details of the basic arithmetic appear in a separate program module called MMIX-ARITH, because the same routines are needed also for the assembler and for the non-pipelined simulator. The definition of type **tetra** should be changed, if necessary, to conform with the definitions found there.

```
\langle \text{Type definitions } 11 \rangle + \equiv
  typedef unsigned int tetra:
     /* for systems conforming to the LP-64 data model */
  typedef struct {
     tetra h, l:
               /* two tetrabytes make one octabyte */
18. \langle Internal prototypes 13 \rangle + \equiv
  static void print_octa ARGS((octa));
19. \langle Subroutines 14\rangle + \equiv
  static void print_octa(o)
       octa o:
     if (o.h) printf("%x%08x", o.h, o.l); else printf("%x", o.l);
20. \langle Global variables 20 \rangle \equiv
                                /* zero\_octa.h = zero\_octa.l = 0 */
  extern octa zero_octa;
                               /* neg\_one.h = neg\_one.l = -1 */
  extern octa neq_one;
  extern octa aux; /* auxiliary output of a subroutine */
                               /* set by certain subroutines for signed arithmetic */
  extern bool overflow;
                                /* bits set by floating point operations */
  extern int exceptions;
  extern int cur_round;
                                /* the current rounding mode */
See also sections 36, 41, 48, 50, 51, 53, 54, 65, 70, 78, 83, 88, 99, 107, 127, 148, 154, 194, 230, 235,
     238, 248, 285, 303, 305, 315, 374, 376, and 388.
This code is used in section 3.
```

**21.** Most of the subroutines in MMIX-ARITH return an octabyte as a function of two octabytes; for example, oplus(y,z) returns the sum of octabytes y and z. Multiplication returns the high half of a product in the global variable aux; division returns the remainder in aux.

```
\langle \text{Subroutines } 14 \rangle + \equiv
extern octa oplus ARGS((octa y, octa z)); /* unsigned y + z */
extern octa ominus ARGS((octa y, octa z)); /* unsigned y - z */
```

fixit: octa (), MMIX-ARITH §88.

```
extern octa incr ARGS((octa y, int delta));
                                                       /* unsigned y + \delta (\delta is signed) */
  extern octa oand ARGS((octa y, octa z));
                                                      /* y \wedge z */
  extern octa oandn ARGS((octa y, octa z));
                                                       /* y \wedge \bar{z} */
  extern octa shift\_left \ ARGS((octa \ y, int \ s));
                                                        /* y \ll s, 0 < s < 64 */
  extern octa shift_right ARGS((octa y, int s, int u)); /* y \gg s, signed if \neg u */
  extern octa omult \ ARGS((octa \ y, octa \ z));
                                                       /* unsigned (aux, x) = y \times z */
  extern octa signed\_omult \ ARGS((octa y, octa z));
     /* signed x = y \times z, setting overflow */
  extern octa odiv \ ARGS((octa \ x, octa \ y, octa \ z));
     /* unsigned (x,y)/z; aux = (x,y) \mod z */
  extern octa signed\_odiv \ ARGS((octa y, octa z));
     /* signed y/z, when z \neq 0; aux = y \mod z */
  extern int count\_bits ARGS((tetra z));
                                              /* x = \nu(z) */
  extern tetra byte\_diff ARGS((tetra y, tetra z));
                                                            /* half of BDIF */
  extern tetra wyde\_diff \ ARGS((tetra \ y, tetra \ z));
                                                             /* half of WDIF */
  extern octa bool_mult ARGS((octa y, octa z, bool xor));
                                                                     /* MOR or MXOR */
  extern octa load\_sf ARGS((tetra z));
                                                /* load short float */
                                                /* store short float */
  extern tetra store\_sf ARGS((octa x));
                                                    /* floating point x = y \oplus z */
  extern octa fplus \ ARGS((octa \ y, octa \ z));
                                                     /* floating point x = y \otimes z */
  extern octa fmult \ ARGS((\text{octa } y, \text{octa } z));
  extern octa fdivide ARGS((octa y, octa z)); /* floating point x = y \oslash z */
  extern octa froot ARGS((octa, int));
                                            /* floating point x = \sqrt{z} */
  extern octa fremstep ARGS((octa\ y, octa\ z, int\ delta));
     /* floating point x \operatorname{rem} z = y \operatorname{rem} z */
  extern octa fintegerize \ ARGS((octa z, int mode));
                                                             /* floating point x = \text{round}(z) */
  extern int fcomp \ ARGS((\mathbf{octa} \ y, \mathbf{octa} \ z));
    /* -1, 0, 1, \text{ or } 2 \text{ if } y < z, y = z, y > z, y \parallel z */
  extern int fepscomp \ ARGS((octa \ y, octa \ z, octa \ eps, int \ sim));
    /* x = sim? [y \sim z(\epsilon)] : [y \approx z(\epsilon)] */
  extern octa floatit ARGS((octa z, int mode, int unsqud, int shrt));
     /* fix to float */
  extern octa fixit ARGS((octa z, int mode));
                                                       /* float to fix */
                                 floatit: octa (),
                                                                   omult: octa (),
ARGS = macro, \S 6.
aux: octa, MMIX-ARITH §4.
                                   MMIX-ARITH §89.
                                                                    MMIX-ARITH §8.
bool = enum, \S 11.
                                 fmult: octa (),
                                                                   oplus: octa (), mmix-arith §5.
bool_mult: octa (),
                                   MMIX-ARITH §41.
                                                                   overflow: bool,
 MMIX-ARITH §29.
                                 fplus: octa (),
                                                                    MMIX-ARITH §4.
byte\_diff: \mathbf{tetra} (),
                                   MMIX-ARITH §46.
                                                                   printf: int (), <stdio.h>.
 MMIX-ARITH §27.
                                 fremstep: octa (),
                                                                   shift_left: octa (),
count_bits: int (),
                                   MMIX-ARITH §93.
                                                                    MMIX-ARITH §7.
 MMIX-ARITH §26.
                                 froot: octa (),
                                                                   shift_right: octa (),
cur_round: int,
                                   MMIX-ARITH §91.
                                                                    MMIX-ARITH §7.
 MMIX-ARITH §30.
                                 incr: octa (), MMIX-ARITH §6.
                                                                   signed_odiv: octa (),
exceptions: int,
                                 load_sf: octa (),
                                                                     MMIX-ARITH §24.
  MMIX-ARITH §32.
                                   MMIX-ARITH §39.
                                                                   signed_omult: octa (),
fcomp: int (), MMIX-ARITH §85. neg_one: octa, MMIX-ARITH §4.
                                                                    MMIX-ARITH §12.
fdivide: octa (),
                                 oand: octa (),
                                                                   store_sf: tetra (),
 MMIX-ARITH §44.
                                  MMIX-ARITH §25.
                                                                    MMIX-ARITH §40.
fepscomp: int (),
                                 oandn: octa (),
                                                                   wyde_diff: tetra (),
  MMIX-ARITH §50.
                                   MMIX-ARITH §25.
                                                                    MMIX-ARITH §28.
fintegerize: octa (),
                                 odiv: octa (), MMIX-ARITH §13. zero_octa: octa,
                                                                    MMIX-ARITH §4.
 MMIX-ARITH §86.
                                 ominus: octa (),
```

MMIX-ARITH §5.

This code is used in section 10.

## 22. We had better check that our 32-bit assumption holds.

```
\begin{split} &\langle \, \text{Initialize everything 22} \, \rangle \equiv \\ & \quad \text{if } (shift\_left(neg\_one, 1).h \neq \text{\#ffffffff}) \\ & \quad panic(errprint\theta\,(\text{"Incorrect}\_implementation}\_of\_type\_tetra")); \\ & \text{See also sections 26, 61, 71, 79, 89, 116, 128, 153, 231, 236, 249, and 286.} \end{split}
```

23. Coroutines. As stated earlier, this program can be regarded as a system of interacting coroutines. Coroutines—sometimes called threads—are more or less independent processes that share and pass data and control back and forth. They correspond to the individual workers in an organization.

We don't need the full power of recursive coroutines, in which new threads are spawned dynamically and have independent stacks for computation; we are, after all, simulating a fixed piece of hardware. The total number of coroutines we deal with is established once and for all by the *MMIX\_config* routine, and each coroutine has a fixed amount of local data.

The simulation operates one clock tick at a time, by executing all coroutines scheduled for time t before advancing to time t + 1. The coroutines at time t may decide to become dormant or they may reschedule themselves and/or other coroutines for future times.

Each coroutine has a symbolic *name* for diagnostic purposes (e.g., ALU1); a nonnegative *stage* number (e.g., 2 for the second stage of a pipeline); a pointer to the next coroutine scheduled at the same time (or  $\Lambda$  if the coroutine is unscheduled); a pointer to a lock variable (or  $\Lambda$  if no lock is currently relevant); and a reference to a control block containing the data to be processed.

```
\langle \text{Type definitions } 11 \rangle + \equiv
  typedef struct coroutine_struct {
     char *name;
                       /* symbolic identification of a coroutine */
                    /* its rank */
     int stage;
     struct coroutine_struct *next;
                                            /* its successor */
     struct coroutine_struct **lockloc;
                                               /* what it might be locking */
     struct control_struct *ctl;
                                      /* its data */
  } coroutine;
24. \langle Internal prototypes 13\rangle + \equiv
  static void print_coroutine_id ARGS((coroutine *));
  static void errprint_coroutine_id ARGS((coroutine *));
```

**26.** Coroutine control is masterminded by a ring of queues, one each for times t,  $t+1, \ldots, t+ring\_size-1$ , when t is the current clock time.

All scheduling is first-come-first-served, except that coroutines with higher *stage* numbers have priority. We want to process the later stages of a pipeline first, in this sequential implementation, for the same reason that a car must drive from M station into W station before another car can enter M station.

Each queue is a circular list of **coroutine** nodes, linked together by their *next* fields. A list head h with  $stage = max\_stage$  comes at the end and the beginning of the queue. (All stage numbers of legitimate coroutines are less than  $max\_stage$ .) The queued items are  $h\neg next$ ,  $h\neg next \neg next$ , etc., from back to front, and we have  $c\neg stage < c\neg next \neg stage$  unless c = h.

Initially all queues are empty.

```
 \begin{split} \langle \text{ Initialize everything } 22 \rangle + &\equiv \\ \{ \text{ } \textbf{register coroutine } *p; \\ \text{ } \textbf{for } (p = ring; \ p < ring + ring\_size; \ p++) \ p \neg next = p; \\ \} \end{split}
```

**27.** To schedule a coroutine c with positive delay  $d < ring\_size$ , we call the function schedule(c,d,s). (The s parameter is used only if scheduling is being logged; it does not affect the computation, but we will generally set s to the state at which the scheduled coroutine will begin.)

```
while (p→next→stage < c→stage) p = p→next;
c→next = p→next;
p→next = c;
if (verbose & schedule_bit) {
    printf("uschedulingu"); print_coroutine_id(c);
    printf("uatutimeu%d,ustateu%d\n", ticks.l + d, s);
}
}

29. ⟨External variables 4⟩ +≡
Extern int ring_size; /* set by MMIX_config, must be sufficiently large */
Extern coroutine *ring;
Extern int cur_time;</pre>
```

**30.** The all-important *ctl* field of a coroutine, which contains the data being manipulated, will be explained below. One of its key components is the *state* field, which helps to specify the next actions the coroutine will perform. When we schedule a coroutine for a new task, we often want it to begin in state 0.

```
⟨Internal prototypes 13⟩ +≡
static void startup ARGS((coroutine *, int));
31. ⟨Subroutines 14⟩ +≡
```

```
static void startup(c, d)

coroutine *c;

int d;

{

c \rightarrow ctl \rightarrow state = 0;

schedule(c, d, 0);

}
```

```
\begin{array}{l} \textbf{ARGS} = \text{macro, } \S 6. \\ confusion = \text{macro (), } \S 13. \\ \textbf{coroutine} = \textbf{struct, } \S 23. \\ ctl: \textbf{control *, } \S 23. \\ errprint0 = \text{macro (), } \S 13. \\ errprint1 = \text{macro (), } \S 13. \\ errprint2 = \text{macro (), } \S 13. \\ \end{array}
```

```
Extern = macro, §4.
l: tetra, §17.
max_stage = 99, §129.
MMIX_config: void (),
MMIX-CONFIG §38.
name: char *, §23.
next: coroutine *, §23.
```

 $panic = macro(), \S13.$  printf : int(), < stdio.h>.  $schedule\_bit = 1 \ll 3, \S8.$   $stage : int, \S23.$   $state : int, \S44.$   $ticks : Extern octa, \S87.$  $verbose : int, \S4.$  **32.** The following routine removes a coroutine from whatever queue it's in. The case c op next = c is also permitted; such a self-loop can occur when a coroutine goes to sleep and expects to be awakened (that is, scheduled) by another coroutine. Sleeping coroutines have important data in their ctl field; they are therefore quite different from unscheduled or "unemployed" coroutines, which have  $c op next = \Lambda$ . An unemployed coroutine is not assumed to have any valid data in its ctl field.

**34.** When it is time to process all coroutines that have queued up for a particular time t, we empty the queue called ring[t] and link its items in the opposite order (from front to back). The following subroutine uses the well known algorithm discussed in exercise 2.2.3–7 of The Art of Computer Programming.

```
\langle \text{Internal prototypes } 13 \rangle + \equiv
  static coroutine *queuelist ARGS((int));
      \langle \text{Subroutines } 14 \rangle + \equiv
  static coroutine *queuelist(t)
   { register coroutine *p, *q = \&sentinel, *r;
      for (p = ring[t].next; p \neq \&ring[t]; p = r) {
        r = p \rightarrow next;
        p \rightarrow next = q;
        q = p;
      ring[t].next = \&ring[t];
      sentinel.next = q;
     return q;
   }
36. \langle Global variables 20 \rangle +\equiv
  coroutine sentinel;
                               /* dummy coroutine at origin of circular list */
```

**37.** Coroutines often start working on tasks that are *speculative*, in the sense that we want certain results to be ready if they prove to be useful; we understand that speculative computations might not actually be needed. Therefore a coroutine might need to be aborted before it has finished its work.

All coroutines must be written in such a way that important data structures remain intact even when the coroutine is abruptly terminated. In particular, we need to be sure that "locks" on shared resources are restored to an unlocked state when a coroutine holding the lock is aborted.

A **lockvar** variable is  $\Lambda$  when it is unlocked; otherwise it points to the coroutine responsible for unlocking it.

```
#define set\_lock(c, l)
           \{ l = c; (c) \neg lockloc = \&(l); \}
#define release\_lock(c, l)
           \{ l = \Lambda; (c) \neg lockloc = \Lambda; \}
\langle \text{Type definitions } 11 \rangle + \equiv
  typedef coroutine *lockvar;
       \langle \text{External prototypes } 9 \rangle + \equiv
  Extern void print_locks ARGS((void));
       \langle \text{External routines } 10 \rangle + \equiv
  void print_locks()
     print_cache_locks(ITcache);
     print_cache_locks(DTcache);
     print_cache_locks(Icache);
     print_cache_locks(Dcache);
     print_cache_locks(Scache);
     if (mem_lock) printf("mem_locked_by_%s:%d\n", mem_lock-name, mem_lock-stage);
     if (dispatch_lock)
        printf("dispatch_locked_by_\%s:\%d\n", dispatch_lock→name, dispatch_lock→stage);
     if (wbuf_lock) printf("head_of_write_buffer_locked_by_%s:%d\n", wbuf_lock-name,
              wbuf\_lock \rightarrow stage);
     if (clean_lock)
        printf("cleaner_locked_by_l%s:%d\n", clean_lock \rightarrow name, clean_lock \rightarrow stage);
     if (speed_lock) printf("write_ibuffer_iflush_locked_iby_i%s:%d\n", speed_lock-name,
              speed\_lock \rightarrow stage);
  }
```

```
ARGS = macro, §6.

clean_lock: lockvar, §230.

coroutine = struct, §23.

ctl: control *, §23.

Dcache: cache *, §168.

dispatch_lock: lockvar, §65.

DTcache: cache *, §168.

Extern = macro, §4.

Icache: cache *, §168.
```

```
ITcache: cache *, §168.
lockloc: coroutine **, §23.
mem_lock: lockvar, §214.
name: char *, §23.
next: coroutine *, §23.
print_cache_locks: static void
(), §174.
print_coroutine_id: static
void (), §25.
```

 $\begin{array}{l} printf\colon \mathbf{int}\ (\ ), <& \mathbf{stdio.h>}.\\ ring\colon \mathbf{coroutine}\ *,\ \S 29.\\ Scache\colon \mathbf{cache}\ *,\ \S 168.\\ schedule\_bit=1 \ll 3,\ \S 8.\\ speed\_lock\colon \mathbf{lockvar},\ \S 247.\\ stage\colon \mathbf{int},\ \S 23.\\ verbose\colon \mathbf{int},\ \S 4.\\ wbuf\_lock\colon \mathbf{lockvar},\ \S 247.\\ \end{array}$ 

**40.** Many of the quantities we deal with are speculative values that might not yet have been certified as part of the "real" calculation; in fact, they might not yet have been calculated.

A spec consists of a 64-bit quantity o and a pointer p to a specnode. The value o is meaningful only if the pointer p is  $\Lambda$ ; otherwise p points to a source of further information.

A **specnode** is a 64-bit quantity o together with links to other **specnode**s that are above it or below it in a doubly linked list. An additional known bit tells whether the o field has been calculated. There also is a 64-bit addr field, to identify the list and give further information. A **specnode** list keeps track of speculative values related to a specific register or to all of main memory; we will discuss such lists in detail later.

```
\langle \text{Type definitions } 11 \rangle + \equiv
  typedef struct {
     octa o:
     struct specnode_struct *p;
  } spec;
  typedef struct specnode_struct {
     octa o:
     bool known;
     octa addr:
     struct specnode\_struct *up, *down;
  } specnode;
41. \langle Global variables 20 \rangle +\equiv
  spec zero_spec;
                         /* zero_spec.o.h = zero_spec.o.l = 0 and zero_spec.p = \Lambda */
       \langle \text{Internal prototypes } 13 \rangle + \equiv
  static void print_spec ARGS((spec));
43.
       \langle Subroutines 14\rangle + \equiv
  static void print\_spec(s)
        \mathbf{spec}\ s;
     if (\neg s.p) print_octa(s.o);
     else {
        printf(">"); print\_specnode\_id(s.p \rightarrow addr);
  }
  static void print_specnode(s)
        specnode s;
     if (s.known) { print\_octa(s.o); printf("!"); }
     else if (s.o.h \lor s.o.l) { print\_octa(s.o); printf("?"); }
     else printf ("?");
     print\_specnode\_id(s.addr);
  }
```

octa = struct, §17. (), §91.
print\_octa: static void (), §19. printf: int (), <stdio.h>.

 $print\_specnode\_id \colon \mathbf{static\ void}$ 

**44.** The analog of an automobile in our simulator is a block of data called **control**, which represents all the relevant facts about an MMIX instruction. We can think of it as the work order attached to a car's windshield. Each group of employees updates the work order as the car moves through the shop.

A **control** record contains the original location of an instruction, and its four bytes OP X Y Z. An instruction has up to four inputs, which are **spec** records called y, z, b and ra; it also has up to three outputs, which are **specnode** records called x, a, and rl. (We usually don't mention the special input ra or the special output rl, which refer to MMIX's internal registers rA and rL.) For example, the main inputs to a DIVU command are \$Y, \$Z, and rD; the outputs are the quotient \$X\$ and the remainder rR. The inputs to a STO command are \$Y, \$Z, and \$X; there is one "output," and the field x.addr will be set to the physical address of the memory location corresponding to virtual address Y + Z.

Each **control** block also points to the coroutine that owns it, if any. And it has various other fields that contain other tidbits of information; for example, we have already mentioned the *state* field, which often governs a coroutine's actions. The i field, which contains an internal operation code number, is generally used together with *state* to switch between alternative computational steps. If, for example, the op field is SUB or SUBI or NEG or NEGI, the internal opcode i will be simply sub. We shall define all the fields of **control** records now and discuss them later.

An actual hardware implementation of MMIX wouldn't need all the information we are putting into a **control** block. Some of that information would typically be latched between stages of a pipeline; other portions would probably appear in so-called "rename registers." We simulate rename registers only indirectly, by counting how many registers of that kind would be in use if we were mimicking low-level hardware details more precisely. The *go* field is a **specnode** for convenience in programming, although we use only its *known* and *o* subfields. It generally contains the address of the subsequent instruction.

```
\langle \text{Type definitions } 11 \rangle + \equiv
  (Declare mmix_opcode and internal_opcode 47)
  typedef struct control_struct {
                   /* virtual address where an instruction originated */
    octa loc:
    mmix_opcode op; unsigned char xx, yy, zz;
       /* the original instruction bytes */
    spec y, z, b, ra;
                           /* inputs */
    specnode x, a, qo, rl;
                                 /* outputs */
                             /* a coroutine whose ctl this is */
    coroutine *owner:
    internal_opcode i;
                             /* internal opcode */
                   /* internal mindset */
    int state;
    bool usage;
                     /* should rU be increased? */
    bool need\_b;
                      /* should we stall until b.p \equiv \Lambda? */
    bool need_ra:
                       /* should we stall until ra.p \equiv \Lambda? */
    bool ren_x;
                     /* does x correspond to a rename register? */
    bool mem_x:
                       /* does x correspond to a memory write? */
    bool ren_a:
                     /* does a correspond to a rename register? */
                    /* does rl correspond to a new value of rL? */
    bool set_l;
```

```
bool interim;
                    /* does this instruction need to be reissued on interrupt? */
  bool stack_alert;
                       /* is there potential for stack overflow? */
                              /* arithmetic exceptions for event bits of rA */
  unsigned int arith_exc;
                         /* history bits for use in branch prediction */
  unsigned int hist;
                         /* execution time penalties for subnormal handling */
  int denin, denout;
  octa cur_O, cur_S;
                        /* speculative rO and rS before this instruction */
                              /* does this instruction generate an interrupt? */
  unsigned int interrupt;
  void *ptr_a, *ptr_b, *ptr_c;
                                /* generic pointers for miscellaneous use */
} control;
```

addr: octa, §40. bool = enum, §11. coroutine = struct, §23. ctl: control \*, §23. internal\_opcode = enum, §49. known: bool, §40. mmix\_opcode = enum, §47. o: octa, §40. octa = struct, §17.

p: specnode \*, §40. spec = struct, §40. specnode = struct, §40. sub = 31, §49.

```
45.
          \langle \text{Internal prototypes } 13 \rangle + \equiv
   static void print_control_block ARGS((control *));
          \langle Subroutines 14\rangle + \equiv
   static void print_control_block(c)
           control *c;
    {
       octa default_qo;
       if (c \rightarrow loc.h \lor c \rightarrow loc.l \lor c \rightarrow op \lor c \rightarrow xx \lor c \rightarrow yy \lor c \rightarrow zz \lor c \rightarrow owner) {
           print\_octa(c \rightarrow loc);
           printf(":_1\%02x\%02x\%02x\%02x(\%s)", c \rightarrow op, c \rightarrow xx, c \rightarrow yy, c \rightarrow zz, internal\_op\_name[c \rightarrow i]);
       if (c→usage) printf ("*");
       if (c \rightarrow interim) printf ("+");
       if (c \rightarrow y.o.h \lor c \rightarrow y.o.l \lor c \rightarrow y.p) \{ printf(" \downarrow y = "); print\_spec(c \rightarrow y); \}
       if (c \rightarrow z.o.h \lor c \rightarrow z.o.l \lor c \rightarrow z.p) \{ printf(" z = "); print\_spec(c \rightarrow z); \}
       if (c \rightarrow b.o.h \lor c \rightarrow b.o.l \lor c \rightarrow b.p \lor c \rightarrow need\_b) {
           printf("\_b="); print\_spec(c \rightarrow b);
           if (c \rightarrow need\_b) printf("*");
       if (c \rightarrow need\_ra) { printf(" \Box rA = "); print\_spec(c \rightarrow ra); }
        \textbf{if} \ (c \neg ren\_x \lor c \neg mem\_x) \ \{ \ printf(" \sqcup x = "); \ print\_specnode(c \neg x); \ \} 
       else if (c \rightarrow x.o.h \lor c \rightarrow x.o.l) {
           printf(" \bot x = "); print\_octa(c \rightarrow x.o); printf(" \% c ", c \rightarrow x.known?'!' : '?');
       if (c \rightarrow ren\_a) { printf("\_a="); print\_specnode(c \rightarrow a); }
       if (c \rightarrow set\_l) { printf(" \Box rL = "); print\_specnode(c \rightarrow rl); }
       if (c→interrupt) { printf("uint="); print_bits(c→interrupt); }
       if (c \rightarrow arith\_exc) { printf("\_exc="); print\_bits(c \rightarrow arith\_exc \ll 8); }
        default\_go = incr(c \rightarrow loc, 4);
       if (c \rightarrow go.o.l \neq default\_go.l \lor c \rightarrow go.o.h \neq default\_go.h) {
           printf("_{\bot \bot} -> "); print\_octa(c \rightarrow qo.o);
       if (verbose & show_pred_bit) printf("_hist=%x", c→hist);
       if (c \rightarrow i \equiv pop) {
           printf("⊔rS=");
           print\_octa(c \rightarrow cur\_S);
           printf ("□r0=");
           print\_octa(c \rightarrow cur\_O);
       printf("⊔state=%d", c→state);
```

a: specnode, §44.  $ARGS = macro, \S 6.$ arith\_exc: unsigned int, §44. b: **spec**, §44.  $control = struct, \S 44.$ cur\_O: octa, §44.  $cur\_S$ : octa, §44.  $go = 72, \S 49.$ h: tetra, §17. hist: unsigned int, §44. i: internal\_opcode, §44. incr: octa (), MMIX-ARITH §6. interim: bool, §44. internal\_op\_name: char \*[], interrupt: unsigned int, §44.  $known: bool, \S 40.$ 

l: tetra, §17. loc: **octa**, §44.  $mem_x$ : bool, §44. need\_b: **bool**, §44. need\_ra: bool, §44. o: octa, §40. octa = struct, §17.op: mmix\_opcode, §44. owner: coroutine \*, §44. p: specnode \*, §40.  $pop = 75, \S 49.$ print\_bits: static void (), §56. print\_octa: static void (), §19. print\_spec: static void (), §43. print\_specnode: static void (), §43.

printf: int (), <stdio.h>. ra: spec, §44. ren\_a: **bool**, §44. *ren\_x*: **bool**, §44. rl: specnode, §44. *set*<sub>**-**</sub>*l*: **bool**, §44.  $show\_pred\_bit = 1 \ll 7, \S 8.$ state: int, §44. usage: bool, §44. verbose: int, §4. x: specnode, §44. xx: unsigned char, §44. y: **spec**, §44. yy: unsigned char,  $\S 44$ . z: spec, §44. zz: unsigned char, §44.

MMIX-PIPE: LISTS 170

```
47.
      Lists.
               Here is a (boring) list of all the MMIX opcodes, in order.
\langle \text{ Declare } \mathbf{mmix\_opcode} \text{ and } \mathbf{internal\_opcode} \text{ 47} \rangle \equiv
  typedef enum {
     TRAP. FCMP. FUN. FEQL. FADD. FIX. FSUB. FIXU.
     FLOT, FLOTI, FLOTU, FLOTUI, SFLOT, SFLOTI, SFLOTU, SFLOTUI,
     FMUL, FCMPE, FUNE, FEQLE, FDIV, FSQRT, FREM, FINT,
     MUL, MULI, MULU, MULUI, DIV, DIVI, DIVU, DIVUI,
     ADD, ADDI, ADDU, ADDUI, SUB, SUBI, SUBU, SUBUI,
     IIADDU. IIADDUI. IVADDU. IVADDUI. VIIIADDU. VIIIADDUI. XVIADDUI. XVIADDUI.
     CMP, CMPI, CMPU, CMPUI, NEG, NEGI, NEGU, NEGUI,
     SL, SLI, SLU, SLUI, SR, SRI, SRU, SRUI,
     BN, BNB, BZ, BZB, BP, BPB, BOD, BODB,
     BNN, BNNB, BNZ, BNZB, BNP, BNPB, BEV, BEVB,
     PBN, PBNB, PBZ, PBZB, PBP, PBPB, PBOD, PBODB,
     PBNN, PBNNB, PBNZ, PBNZB, PBNP, PBNPB, PBEV, PBEVB,
     CSN, CSNI, CSZ, CSZI, CSP, CSPI, CSOD, CSODI,
     CSNN, CSNNI, CSNZ, CSNZI, CSNP, CSNPI, CSEV, CSEVI,
     ZSN.ZSNI, ZSZ, ZSZI, ZSP, ZSPI, ZSOD, ZSODI,
     ZSNN.ZSNNI.ZSNZ.ZSNZI.ZSNP.ZSNPI.ZSEV.ZSEVI.
     LDB, LDBI, LDBU, LDBUI, LDW, LDWI, LDWU, LDWUI,
     LDT, LDTI, LDTU, LDTUI, LDO, LDOI, LDOU, LDOUI,
     LDSF, LDSFI, LDHT, LDHTI, CSWAP, CSWAPI, LDUNC, LDUNCI,
     LDVTS, LDVTSI, PRELD, PRELDI, PREGO, PREGOI, GO, GOI,
     STB, STBI, STBU, STBUI, STW, STWI, STWU, STWUI,
     STT, STTI, STTU, STTUI, STO, STOI, STOU, STOUI,
     STSF, STSFI, STHT, STHTI, STCO, STCOI, STUNC, STUNCI,
     SYNCD, SYNCDI, PREST, PRESTI, SYNCID, SYNCIDI, PUSHGO, PUSHGOI,
     OR, ORI, ORN, ORNI, NOR, NORI, XOR, XORI,
     AND, ANDI, ANDN, ANDNI, NAND, NANDI, NXOR, NXORI,
     BDIF, BDIFI, WDIF, WDIFI, TDIF, TDIFI, ODIF, ODIFI,
     MUX, MUXI, SADD, SADDI, MOR, MORI, MXOR, MXORI,
     SETH, SETMH, SETML, SETL, INCH, INCMH, INCML, INCL,
     ORH, ORMH, ORML, ORL, ANDNH, ANDNMH, ANDNML, ANDNL,
     JMP, JMPB, PUSHJ, PUSHJB, GETA, GETAB, PUT, PUTI,
     POP, RESUME, SAVE, UNSAVE, SYNC, SWYM, GET, TRIP
  } mmix_opcode:
```

See also section 49.

This code is used in section 44.

171 MMIX-PIPE: LISTS

```
48. \langle Global variables 20\rangle + \equiv
        \mathbf{char} * opcode\_name[] = \{"\mathtt{TRAP}", "\mathtt{FCMP}", "\mathtt{FUN}", "\mathtt{FEQL}", "\mathtt{FADD}", "\mathtt{FIX}", "\mathtt{FSUB}", "\mathtt{FIXU}", "\mathtt{FUN}", "
        "FLOT", "FLOTI", "FLOTU", "FLOTUI", "SFLOT", "SFLOTI", "SFLOTU", "SFLOTUI",
        "FMUL". "FCMPE". "FUNE". "FEQLE". "FDIV". "FSQRT". "FREM". "FINT".
        "MUL", "MULI", "MULU", "MULUI", "DIV", "DIVI", "DIVU", "DIVUI",
        "ADD". "ADDI". "ADDU". "ADDUI". "SUB". "SUBI". "SUBU". "SUBUI".
        "2ADDU", "2ADDUI", "4ADDUI", "4ADDUI", "8ADDUI", "8ADDUI", "16ADDUI", "16ADDU
        "CMP", "CMPI", "CMPU", "CMPUI", "NEG", "NEGI", "NEGU", "NEGUI",
        "SL", "SLI", "SLU", "SLUI", "SR", "SRI", "SRU", "SRUI",
        "BN", "BNB", "BZ", "BZB", "BP", "BPB", "BOD", "BODB",
         "BNN", "BNNB", "BNZ", "BNZB", "BNP", "BNPB", "BEV", "BEVB",
        "PBN", "PBNB", "PBZ", "PBZB", "PBP", "PBPB", "PBOD", "PBODB",
        "PBNN". "PBNNB", "PBNZ", "PBNZB", "PBNP", "PBNPB", "PBEV", "PBEVB",
        "CSN"."CSNI"."CSZ", "CSZI", "CSP", "CSPI", "CSOD", "CSODI",
         "CSNN". "CSNNI". "CSNZ". "CSNZI". "CSNP". "CSNPI". "CSEV". "CSEVI".
        "ZSN". "ZSNI". "ZSZ". "ZSZI". "ZSP". "ZSPI". "ZSOD". "ZSODI".
        "ZSNN"."ZSNNI"."ZSNZ"."ZSNZI"."ZSNP"."ZSNPI"."ZSEV"."ZSEVI".
        "LDB", "LDBI", "LDBU", "LDBUI", "LDW", "LDWI", "LDWU", "LDWUI",
         "LDT", "LDTI", "LDTU", "LDTUI", "LDO", "LDOI", "LDOU", "LDOUI".
        "LDSF", "LDSFI", "LDHT", "LDHTI", "CSWAP", "CSWAPI", "LDUNC", "LDUNCI",
         "LDVTS", "LDVTSI", "PRELD", "PRELDI", "PREGO", "PREGOI", "GOI",
         "STB", "STBI", "STBU", "STBUI", "STW", "STWI", "STWU", "STWUI",
         "STT", "STTI", "STTU", "STTUI", "STO", "STOI", "STOU", "STOUI",
         "STSF". "STSFI". "STHT". "STHTI". "STCO". "STCOI". "STUNC". "STUNCI".
         "SYNCD". "SYNCDI". "PREST". "PRESTI". "SYNCID". "SYNCIDI". "PUSHGO". "PUSHGOI".
         "OR", "ORI", "ORN", "ORNI", "NOR", "NORI", "XORI", "XORI",
        "AND". "ANDI". "ANDN". "ANDNI". "NAND". "NANDI". "NXOR". "NXORI".
         "BDIF", "BDIFI", "WDIF", "WDIFI", "TDIF", "TDIFI", "ODIF", "ODIFI",
        "MUX", "MUXI", "SADD", "SADDI", "MOR", "MORI", "MXORI",
         "SETH". "SETMH". "SETML". "SETL". "INCH". "INCMH". "INCML". "INCL".
        "ORH", "ORMH", "ORML", "ORL", "ANDNH", "ANDNMH", "ANDNML", "ANDNL",
         "JMP". "JMPB". "PUSHJ". "PUSHJB". "GETA". "GETAB". "PUT". "PUTI".
         "POP", "RESUME", "SAVE", "UNSAVE", "SYNC", "SWYM", "GET", "TRIP" };
```

49. And here is a (likewise boring) list of all the internal opcodes. The smallest numbers, less than or equal to  $max\_pipe\_op$ , correspond to operations for which arbitrary pipeline delays can be configured with  $MMIX\_config$ . The largest numbers, greater than  $max\_real\_command$ , correspond to internally generated operations that have no official OP code; for example, there are internal operations to shift the  $\gamma$  pointer in the register stack, and to compute page table entries.

```
\langle \text{ Declare } \mathbf{mmix\_opcode} \text{ and } \mathbf{internal\_opcode} \text{ 47} \rangle + \equiv
#define max_pipe_op feps
#define max_real_command trip
  typedef enum {
    mul0.
               /* multiplication by zero */
    mul1, mul2, mul3, mul4, mul5, mul6, mul7, mul8,
       /* multiplication by 1–8, 9–16, ..., 57–64 bits */
    div,
             /* DIV[U][I] */
    sh,
             /* S[L,R][U][I] */
    mux,
               /* MUX[I] */
    sadd.
               /* SADD[I] */
    mor.
              /* M[X]OR[I] */
    fadd,
               /* FADD, FSUB */
    fmul,
               /* FMUL */
    fdiv.
              /* FDIV */
    fsqrt,
               /* FSQRT */
    fint,
              /* FINT */
    fix.
             /* FIX[U] */
              /* [S]FLOT[U][I] */
    flot,
              /* FCMPE, FUNE, FEQLE */
    feps,
               /* FCMP */
    fcmp,
               /* FUN, FEQL */
    funeq,
    fsub.
              /* FSUB */
    frem.
               /* FREM */
    mul.
              /* MUL[I] */
    mulu.
               /* MULU[I] */
               /* DIVU[I] */
    divu.
    add.
              /* ADD[I] */
               /* [2,4,8,16,] ADDU[I], INC[M][H,L] */
    addu.
    sub.
              /* SUB[I], NEG[I] */
    subu.
               /* SUBU[I], NEGU[I] */
             /* SET[M][H,L], GETA[B] */
    set.
             /* OR[I], OR[M][H,L] */
    or,
    orn.
              /* ORN[I] */
    nor,
              /* NOR[I] */
    and,
              /* AND[I] */
    andn.
               /* ANDN[I], ANDN[M][H,L] */
    nand,
               /* NAND[I] */
    xor,
              /* XOR[I] */
    nxor,
               /* NXOR[I] */
    shlu,
              /* SLU[I] */
              /* SRU[I] */
     shru,
```

```
/* SL[I] */
  shl,
  shr,
          /* SR[I] */
           /* CMP[I] */
  cmp.
            /* CMPU[I] */
  cmpu,
  bdif.
           /* BDIF[I] */
  wdif,
           /* WDIF[I] */
  tdif,
           /* TDIF[I] */
           /* ODIF[I] */
  odif,
  zset,
           /* ZS[N][N,Z,P][I], ZSEV[I], ZSOD[I] */
  cset.
           /* CS[N][N,Z,P][I], CSEV[I], CSOD[I] */
  qet,
          /* GET */
  put,
          /* PUT[I] */
         /* LD[B,W,T,O][U][I], LDHT[I], LDSF[I] */
  ld.
  ldptp.
            /* load page table pointer */
  ldpte.
            /* load page table entry */
  ldunc.
            /* LDUNC[I] */
  ldvts,
            /* LDVTS[I] */
  preld.
            /* PRELD[I] */
  prest,
            /* PREST[I] */
         /* STO[U][I], STCO[I], STUNC[I] */
  st,
            /* SYNCD[I] */
  syncd.
             /* SYNCID[I] */
  syncid,
  pst.
          /* ST[B,W,T][U][I], STHT[I] */
  stunc.
            /* STUNC[I], in write buffer */
  cswap,
            /* CSWAP[I] */
         /* B[N][N,Z,P][B] */
  br,
          /* PB[N][N,Z,P][B] */
  pbr,
             /* PUSHJ[B] */
  pushj,
          /* GO[I] */
  go,
            /* PREGO[I] */
  prego,
              /* PUSHGO[I] */
  pushqo,
           /* POP */
  pop,
              /* RESUME */
  resume,
            /* SAVE */
  save,
             /* UNSAVE */
  unsave,
            /* SYNC */
  sync,
           /* JMP[B] */
  jmp,
  noop,
            /* SWYM */
           /* TRAP */
  trap,
  trip,
           /* TRIP */
  incgamma,
                 /* increase \gamma pointer */
                 /* decrease \gamma pointer */
  decgamma,
  incrl.
            /* increase rL and \beta */
           /* intermediate stage of SAVE */
  sav.
             /* intermediate stage of UNSAVE */
  unsav,
            /* intermediate stage of RESUME */
} internal_opcode;
```

```
50.
      \langle Global variables 20\rangle + \equiv
  char *internal_op_name[] = {"mul0", "mul1", "mul2", "mul3", "mul4", "mul5", "mul6",
       "mul7", "mul8", "div", "sh", "mux", "sadd", "mor", "fadd", "fmul", "fdiv",
       "fsart", "fint", "fix", "flot", "feps", "fcmp", "funeq", "fsub", "frem", "mul",
       "mulu", "divu", "add", "addu", "sub", "subu", "set", "or", "orn", "nor", "and",
       "andn", "nand", "xor", "nxor", "shlu", "shru", "shl", "shr", "cmp", "cmpu",
       "bdif", "wdif", "tdif", "odif", "zset", "cset", "get", "put", "ld", "ldptp",
       "ldpte", "ldunc", "ldvts", "preld", "prest", "st", "syncd", "syncid", "pst",
       "stunc", "cswap", "br", "pbr", "pushj", "go", "prego", "pushgo", "pop", "resume",
       "save", "unsave", "sync", "jmp", "noop", "trap", "trip", "incgamma", "decgamma",
       "incrl", "sav", "unsav", "resum" };
      We need a table to convert the external opcodes to internal ones.
\langle Global variables 20\rangle + \equiv
  internal_opcode internal_op[256] = \{
  trap, fcmp, funeq, funeq, fadd, fix, fsub, fix,
  flot, flot, flot, flot, flot, flot, flot, flot,
  fmul, feps, feps, feps, fdiv, fsqrt, frem, fint,
  mul, mul, mulu, mulu, div, div, divu, divu,
  add, add, addu, addu, sub, sub, subu, subu,
  addu, addu, addu, addu, addu, addu, addu, addu,
  cmp, cmp, cmpu, cmpu, sub, sub, subu, subu,
  shl, shl, shlu, shlu, shr, shru, shru, shru,
  br, br, br, br, br, br, br, br,
  br, br, br, br, br, br, br, br,
  pbr, pbr, pbr, pbr, pbr, pbr, pbr, pbr,
  pbr, pbr, pbr, pbr, pbr, pbr, pbr, pbr,
  cset, cset, cset, cset, cset, cset, cset,
  cset, cset, cset, cset, cset, cset, cset, cset,
  zset, zset, zset, zset, zset, zset, zset,
  zset, zset, zset, zset, zset, zset, zset,
  ld, ld, ld, ld, ld, ld, ld, ld,
  ld, ld, ld, ld, ld, ld, ld, ld,
  ld, ld, ld, ld, cswap, cswap, ldunc, ldunc,
  ldvts, ldvts, preld, preld, prego, prego, go, go,
  pst, pst, pst, pst, pst, pst, pst, pst,
  pst, pst, pst, pst, st, st, st, st,
  pst, pst, pst, pst, st, st, st, st,
  syncd, syncd, prest, prest, syncid, syncid, pushqo, pushqo,
  or, or, orn, orn, nor, nor, xor, xor,
  and, and, andn, andn, nand, nand, nxor, nxor,
  bdif, bdif, wdif, wdif, tdif, tdif, odif, odif,
  mux, mux, sadd, sadd, mor, mor, mor, mor,
  set, set, set, set, addu, addu, addu, addu,
  or, or, or, or, andn, andn, andn, andn,
  jmp, jmp, pushj, pushj, set, set, put, put,
```

pop, resume, save, unsave, sync, noop, get, trip \};

| $add = 29, \S 49.$   |
|----------------------|
| addu = 30, §49.      |
| and = 37, §49.       |
| , ,                  |
| $andn = 38, \S 49.$  |
| $bdif = 48, \S 49.$  |
| $br = 69, \S 49.$    |
| $cmp = 46, \S 49.$   |
| $cmpu = 47, \S 49.$  |
| $cset = 53, \S 49.$  |
| $cswap = 68, \S 49.$ |
| $div = 9, \S 49.$    |
| $divu = 28, \S 49.$  |
| $fadd = 14, \S 49.$  |
| $fcmp = 22, \S 49.$  |
|                      |
| $fdiv = 16, \S 49.$  |
| $feps = 21, \S 49.$  |
| $fint = 18, \S 49.$  |
| $fix = 19, \S 49.$   |
| $flot = 20, \S 49.$  |
| $fmul = 15, \S 49.$  |
| $frem = 25, \S 49.$  |
| $fsqrt = 17, \S 49.$ |
| $fsub = 24, \S 49.$  |
| $funeq = 23, \S 49.$ |
|                      |
| $get = 54, \S 49.$   |

 $go = 72, \S 49.$  $internal\_opcode = enum,$ §49.  $jmp = 80, \S 49.$  $ld = 56, \S 49.$  $ldunc = 59, \S 49.$  $ldvts = 60, \S 49.$  $mor = 13, \S 49.$  $mul = 26, \S 49.$  $mulu = 27, \S 49.$  $mux = 11, \S 49.$  $nand = 39, \S 49.$  $noop = 81, \S 49.$  $nor = 36, \S 49.$  $nxor = 41, \S 49.$  $odif = 51, \S 49.$  $or = 34, \S 49.$  $orn = 35, \S 49.$  $pbr = 70, \S 49.$  $pop = 75, \S 49.$  $prego = 73, \S 49.$  $preld = 61, \S 49.$  $prest = 62, \S 49.$  $pst = 66, \S 49.$ 

 $pushgo = 74, \S 49.$  $pushj = 71, \S 49.$  $put = 55, \S 49.$ resume = 76, §49. $sadd = 12, \S 49.$  $save = 77, \S 49.$ set = 33, §49. $shl = 44, \S 49.$  $shlu = 42, \S 49.$  $shr = 45, \S 49.$  $shru = 43, \S 49.$  $st = 63, \S 49.$  $sub = 31, \S 49.$  $subu = 32, \S 49.$  $sync = 79, \S 49.$  $syncd = 64, \S 49.$ syncid = 65, §49. $tdif = 50, \S 49.$  $trap = 82, \S 49.$  $trip = 83, \S 49.$  $unsave = 78, \S 49.$  $wdif = 49, \S 49.$  $xor = 40, \S 49.$ zset = 52, §49.

**52.** While we're into boring lists, we might as well define all the special register numbers, together with an inverse table for use in diagnostic outputs. These codes have been designed so that special registers 0–7 are unencumbered, 9–11 can't be PUT by anybody, 8 and 12–18 can't be PUT by the user. Pipeline delays might occur when GET is applied to special registers 21–31 or when PUT is applied to special registers 8 or 15–20. The SAVE and UNSAVE commands store and restore special registers 0–6 and 23–27.

```
\langle Header definitions 6\rangle + \equiv
#define rA 21
                     /* arithmetic status register */
#define rB
                    /* bootstrap register (trip) */
#define rC
                    /* continuation register */
                    /* dividend register */
#define rD
                    /* epsilon register */
#define rE
\#define rF
              22
                     /* failure location register */
#define rG
              19
                     /* global threshold register */
\#define rH
              3
                    /* himult register */
                    /* interval counter */
\#define rI
              12
\#define rJ
                    /* return-jump register */
                      /* interrupt mask register */
\#define rK
              15
#define rL
              20
                     /* local threshold register */
\#define rM
                     /* multiplex mask register */
               5
#define rN
              9
                    /* serial number */
\#define rO
              10
                     /* register stack offset */
\#define rP
              23
                     /* prediction register */
#define rQ
                     /* interrupt request register */
              16
#define rR
                    /* remainder register */
              6
#define rS
              11
                     /* register stack pointer */
#define rT
              13
                     /* trap address register */
                      /* usage counter */
#define rU
              17
\#define rV
               18
                      /* virtual translation register */
                      /* where-interrupted register (trip) */
\#define rW
               24
#define rX
               25
                      /* execution register (trip) */
#define rY
               26
                      /* Y operand (trip) */
                     /* Z operand (trip) */
#define rZ
              27
#define rBB 7
                      /* bootstrap register (trap) */
#define rTT
                       /* dynamic trap address register */
\#define rWW
                 28
                        /* where-interrupted register (trap) */
#define rXX
                29
                       /* execution register (trap) */
                       /* Y operand (trap) */
#define rYY
                30
#define rZZ
               31
                       /* Z operand (trap) */
     \langle \text{Global variables } 20 \rangle + \equiv
  char *special_name[32] = { "rB", "rD", "rE", "rH", "rJ", "rM", "rR", "rBB", "rC", "rN",
       "rO", "rS", "rI", "rT", "rTT", "rK", "rQ", "rU", "rV", "rG", "rL", "rA", "rF", "rP",
       "rW", "rX", "rY", "rZ", "rWW", "rXX", "rYY", "rZZ"};
```

**54.** Here are the bit codes that affect trips and traps. The first eight cases also apply to the upper half of rQ; the next eight apply to rA.

```
#define P_BIT (1 \ll 0) /* instruction in privileged location */
```

```
#define S_BIT (1 \ll 1)
                               /* security violation */
#define B_BIT
                               /* instruction breaks the rules */
                 (1 \ll 2)
                               /* instruction for kernel only */
#define K_BIT (1 \ll 3)
                               /* virtual translation bypassed */
#define N_BIT (1 \ll 4)
#define PX_BIT (1 \ll 5)
                                /* permission lacking to execute from page */
#define PW_BIT (1 \ll 6)
                                /* permission lacking to write on page */
#define PR_BIT (1 \ll 7)
                                /* permission lacking to read from page */
#define PROT_OFFSET 5
                               /* distance from PR_BIT to protection code position */
#define X_BIT (1 \ll 8)
                               /* floating inexact */
#define Z_BIT (1 \ll 9)
                               /* floating division by zero */
#define U_BIT
                  (1 \ll 10)
                                /* floating underflow */
                                /* floating overflow */
#define O_BIT (1 \ll 11)
                                /* floating invalid operation */
#define I_BIT
                 (1 \ll 12)
                                /* float-to-fix overflow */
#define W BIT
                  (1 \ll 13)
#define V_BIT
                 (1 \ll 14)
                                /* integer overflow */
#define D_BIT
                                /* integer divide check */
                 (1 \ll 15)
                                /* trip handler bit */
#define H_BIT
                  (1 \ll 16)
                                /* forced trap bit */
#define F_BIT
                 (1 \ll 17)
                                /* external (dynamic) trap bit */
#define E_BIT (1 \ll 18)
\langle \text{Global variables } 20 \rangle + \equiv
  char bit_code_map[] = "EFHDVWIOUZXrwxnkbsp";
      \langle Internal prototypes 13 \rangle + \equiv
  static void print_bits ARGS((int));
56.
      \langle Subroutines 14\rangle + \equiv
  static void print_bits(x)
       int x;
    register int b, j;
    for (j = 0, b = E_BIT; (x & (b + b - 1)) \land b; j ++, b \gg = 1)
       if (x \& b) printf ("%c", bit_code_map[j]);
  }
      The lower half of rQ holds external interrupts of highest priority. Most of them
57.
are implementation-dependent, but a few are defined in general.
\langle Header definitions 6\rangle + \equiv
#define POWER_FAILURE (1 \ll 0)
                                       /* try to shut down calmly and quickly */
#define PARITY_ERROR (1 \ll 1)
                                      /* try to save the file systems */
                                             /* a memory address can't be used */
#define NONEXISTENT_MEMORY (1 \ll 2)
#define REBOOT_SIGNAL (1 \ll 4)
                                       /* it's time to start over */
#define INTERVAL_TIMEOUT (1 \ll 6)
                                           /* the timer register, rI, has reached zero */
#define STACK_OVERFLOW (1 \ll 7)
                                        /* data has been stored on the rC page */
```

**58. Dynamic speculation.** Now that we understand some basic low-level structures, we're ready to look at the larger picture.

This simulator is based on the idea of "dynamic scheduling with register renaming," as introduced in the 1960s by R. M. Tomasulo [IBM Journal of Research and Development 11 (1967), 25–33]. Moreover, the dynamic scheduling method is extended here to "speculative execution," as implemented in several processors of the 1990s and described in section 4.6 of Hennessy and Patterson's Computer Architecture, second edition (1995). The essential idea is to keep track of the pipeline contents by recording all dependencies between unfinished computations in a queue called the reorder buffer. An entry in the reorder buffer might, for example, correspond to an instruction that adds together two numbers whose values are still being computed; those numbers have been allocated space in earlier positions of the reorder buffer. The addition will take place as soon as both of its operands are known, but the sum won't be written immediately into the destination register. It will stay in the reorder buffer until reaching the hot seat at the front of the queue. Finally, the addition leaves the hot seat and is said to be committed.

Some instructions in the reorder buffer may in fact be executed only on speculation, meaning that they won't really be called for unless a prior branch instruction has the predicted outcome. Indeed, we can say that all instructions not yet in the hot seat are being executed speculatively, because an external interrupt might occur at any time and change the entire course of computation. Organizing the pipeline as a reorder buffer allows us to look ahead and keep busy computing values that have a good chance of being needed later, instead of waiting for slow instructions or slow memory references to be completed.

The reorder buffer is in fact a queue of **control** records, conceptually forming part of a circle of such records inside the simulator, corresponding to all instructions that have been dispatched or *issued* but not yet committed, in strict program order.

The best way to get an understanding of speculative execution is perhaps to imagine that the reorder buffer is large enough to hold hundreds of instructions in various stages of execution, and to think of an implementation of MMIX that has dozens of functional units—more than would ever actually be built into a chip. Then one can readily visualize the kinds of control structures and checks that must be made to ensure correct execution. Without such a broad viewpoint, a programmer or hardware designer will be inclined to think only of the simple cases and to devise algorithms that lack the proper generality. Thus we have a somewhat paradoxical situation in which a difficult general problem turns out to be easier to solve than its simpler special cases, because it enforces clarity of thinking.

Instructions that have completed execution and have not yet been committed are analogous to cars that have gone through our hypothetical repair shop and are waiting for their owners to pick them up. However, all analogies break down, and the world of automobiles does not have a natural counterpart for the notion of speculative execution. That notion corresponds roughly to situations in which people are led to believe that their cars need a new piece of equipment, but they suddenly change their mind once they see the price tag, and they insist on having the equipment removed even after it has been partially or completely installed.

Speculatively executed instructions might make no sense: They might divide by zero or refer to protected memory areas, etc. Such anomalies are not considered catastrophic or even exceptional until the instruction reaches the hot seat.

The person who designs a computer with speculative execution is an optimist, who has faith that the vast majority of the machine's predictions will come true. The person who designs a reliable implementation of such a computer is a pessimist, who understands that all predictions might come to naught. The pessimist does, however, take pains to optimize the cases that do turn out well.

**59.** Let's consider what happens to a single instruction, say ADD \$1,\$2,\$3, as it travels through the pipeline in a normal situation. The first time this instruction is encountered, it is placed into the I-cache (that is, the instruction cache), so that we won't have to access memory when we need to perform it again. We will assume for simplicity in this discussion that each I-cache access takes one clock cycle, although other possibilities are allowed by *MMIX\_config*.

Suppose the simulated machine fetches the example ADD instruction at time 1000. Fetching is done by a coroutine whose *stage* number is 0. A cache block typically contains 8 or 16 instructions. The fetch unit of our machine is able to fetch up to *fetch\_max* instructions on each clock cycle and place them in the fetch buffer, provided that there is room in the buffer and that all the instructions belong to the same cache block.

The dispatch unit of our simulator is able to issue up to dispatch\_max instructions on each clock cycle and move them from the fetch buffer to the reorder buffer, provided that functional units are available for those instructions and there is room in the reorder buffer. A functional unit that handles ADD is usually called an ALU (arithmetic logic unit), and our simulated machine might have several of them. If they aren't all stalled in stage 1 of their pipelines, and if the reorder buffer isn't full, and if the machine isn't in the process of deissuing instructions that were mispredicted, and if fewer than dispatch\_max instructions are ahead of the ADD in the fetch buffer, and if all such prior instructions can be issued without using up all the free ALUs, our ADD instruction will be issued at time 1001. (In fact, all of these conditions are usually true.)

We assume that L > 3, so that \$1, \$2, and \$3 are local registers. For simplicity we'll assume in fact that the register stack is empty, so that the ADD instruction is supposed to set  $l[1] \leftarrow l[2] + l[3]$ . The operands l[2] and l[3] might not be known at time 1001; they are **spec** values, which might point to **specnode** entries in the reorder buffer for previous instructions whose destinations are l[2] and l[3]. The dispatcher fills the next available control block of the reorder buffer with information for the ADD, containing appropriate **spec** values corresponding to l[2] and l[3] in its y and z fields. The x field of this control block will be inserted into a doubly linked list of **specnode** records, corresponding to l[1] and to all instructions in the reorder buffer that have l[1] as a destination. The boolean value x.known will be set to false, meaning that this speculative value still needs to be computed. Subsequent instructions that need l[1] as a source will point to x, if they are issued before the sum x.o has been computed. Double linking is used in the **specnode** list because the ADD instruction might be cancelled before it is finally committed; thus deletions might occur at either end of the list for l[1].

At time 1002, the ALU handling the ADD will stall if its inputs y and z are not both known (namely if  $y.p \neq \Lambda$  or  $z.p \neq \Lambda$ ). In fact, it will also stall if its third input rA is not known; the current speculative value of rA, except for its event bits, is represented in the ra field of the control block, and we must have  $ra.p \equiv \Lambda$ . In such a case the ALU will look to see if the **spec** values pointed to by y.p and/or z.p and/or ra.p become defined on this clock cycle, and it will update its own input values accordingly.

But let's assume that y, z, and ra are already known at time 1002. Then x.o will be set to y.o + z.o and x.known will become true. This will make the result destined for l[1] available to be used in other commands at time 1003.

If no overflow occurs when adding y.o to z.o, the interrupt and arith\_exc fields of the control block for ADD are set to zero. But when overflow does occur (shudder), there are two cases, based on the V-enable bit of rA, which is found in field b.o of the control block. If this bit is 0, the V-bit of the arith\_exc field in the control block is set to 1; the arith\_exc field will be ored into rA when the ADD instruction is eventually committed. But if the V-enable bit is 1, the trip handler should be called, interrupting the normal sequence. In such a case, the interrupt field of the control block is set to specify a trip, and the fetcher and dispatcher are told to forget what they have been doing; all instructions following the ADD in the reorder buffer must now be deissued. The virtual starting address of the overflow trip handler, namely location 32, is hastily passed to the fetch routine, and instructions will be fetched from that location as soon as possible. (Of course the overflow and the trip handler are still speculative until the ADD instruction is committed. Other exceptional conditions might cause the ADD itself to be terminated before it gets to the hot seat. But the pipeline keeps charging ahead, always trying to guess the most probable outcome.)

The commission unit of this simulator is able to commit and/or deissue up to commit\_max instructions on each clock cycle. With luck, fewer than commit\_max instructions will be ahead of our ADD instruction at time 1003, and they will all be completed normally. Then l[1] can be set to x.o, and the event bits of rA can be updated from arith\_exc, and the ADD command can pass through the hot seat and out of the reorder buffer.

```
arith_exc: unsigned int, §44.
b: spec, §44.
Extern = macro, §4.
false = 0, §11.
interrupt: unsigned int, §44.
known: bool, §40.
```

```
MMIX_config: void (),

MMIX-CONFIG §38.

o: octa, §40.

p: specnode *, §40.

ra: spec, §44.
```

stage: int, §23. true = 1, §11. x: specnode, §44. y: spec, §44. z: spec, §44. **60.** The instruction currently occupying the hot seat is the only issued-but-not-yet-committed instruction that is guaranteed to be truly essential to the machine's computation. All other instructions in the reorder buffer are being executed on speculation; if they prove to be needed, well and good, but we might want to jettison them all if, say, an external interrupt occurs.

Thus all instructions that change the global state in complicated ways—like LDVTS, which changes the virtual address translation caches—are performed only when they reach the hot seat. Fortunately the vast majority of instructions are sufficiently simple that we can deal with them more efficiently while other computations are taking place.

In this implementation the reorder buffer is simply housed in an array of control records. The first array element is  $reorder\_bot$ , and the last is  $reorder\_top$ . Variable hot points to the control block in the hot seat, and hot-1 to its predecessor, etc. Variable cool points to the next control block that will be filled in the reorder buffer. If  $hot \equiv cool$  the reorder buffer is empty; otherwise it contains the control records hot, hot-1, ..., cool+1, except of course that we wrap around from  $reorder\_bot$  to  $reorder\_top$  when moving down in the buffer.

```
\langle External variables 4 \rangle + \equiv
  Extern control *reorder_bot, *reorder_top;
    /* least and greatest entries in the ring containing the reorder buffer */
  Extern control *hot, *cool; /* front and rear of the reorder buffer */
  Extern control *old_hot;
                                /* value of hot at beginning of cycle */
  Extern int deissues;
                             /* the number of instructions that need to be deissued */
61. \langle Initialize everything 22\rangle + \equiv
  hot = cool = reorder\_top; deissues = 0;
     \langle \text{Internal prototypes } 13 \rangle + \equiv
  static void print_reorder_buffer ARGS((void));
      \langle Subroutines 14\rangle + \equiv
  static void print_reorder_buffer()
     printf("Reorder_buffer");
     if (hot \equiv cool) \ printf("_{\sqcup}(empty)\n");
     else { register control *p;
       if (deissues) printf(",(%d,to,be,deissued)", deissues);
       if (doing_interrupt) printf("□(interrupt□state□¼d)", doing_interrupt);
       printf(": \n");
       for (p = hot; p \neq cool; p = (p \equiv reorder\_bot ? reorder\_top : p - 1)) {
          print\_control\_block(p);
          if (p \rightarrow owner) {
            printf("_{\sqcup}"); print\_coroutine\_id(p \rightarrow owner);
          printf("\n");
       }
     rename\_regs \neq 1? "s": "", mem\_slots, mem\_slots \neq 1? "s": "");
  }
```

```
Here is an overview of what happens on each clock cycle.
\langle Perform one machine cycle 64\rangle \equiv
     (Check for external interrupt 314);
     dispatch\_count = 0;
     old\_hot = hot;
                         /* remember the hot seat position at beginning of cycle */
     old\_tail = tail;
                         /* remember the fetch buffer contents at beginning of cycle */
     suppress\_dispatch = (deissues \lor dispatch\_lock);
     if (doing_interrupt) \langle Perform one cycle of the interrupt preparations 318 \rangle
     else (Commit and/or deissue up to commit_max instructions 67):
     (Execute all coroutines scheduled for the current time 125);
     if (\neg suppress\_dispatch) \(\rightarrow\) Dispatch one cycle's worth of instructions 74\);
     ticks = incr(ticks, 1);
                                /* and the beat moves on */
     dispatch_stat[dispatch_count]++;
  }
This code is used in section 10.
65. \langle Global variables 20 \rangle + \equiv
                           /* how many dispatched on this cycle */
  int dispatch_count;
  bool suppress_dispatch;
                                /* should dispatching be bypassed? */
                            /* how many cycles of interrupt preparations remain */
  int doing_interrupt;
  lockvar dispatch_lock;
                            /* lock to prevent instruction issues */
      \langle \text{External variables 4} \rangle + \equiv
  Extern int *dispatch_stat:
                                    /* how often did we dispatch 0, 1, ... instructions? */
                                       /* omit security checks for testing purposes? */
  Extern bool security_disabled;
```

```
\begin{split} & \text{ARGS} = \text{macro, } \S 6. \\ & \textbf{bool} = \textbf{enum, } \S 11. \\ & commit\_max \colon \textbf{int, } \S 59. \\ & \textbf{control} = \textbf{struct, } \S 44. \\ & \textbf{Extern} = \text{macro, } \S 4. \\ & incr \colon \textbf{octa (), } \texttt{MMIX-ARITH } \S 6. \end{split}
```

```
lockvar = coroutine *, §37.

mem_slots: int, §86.

old_tail: fetch *, §70.

owner: coroutine *, §44.

print_control_block: static

void (), §46.
```

print\_coroutine\_id: static
 void (), §25.
printf: int (), <stdio.h>.
rename\_regs: int, §86.
tail: fetch \*, §69.
ticks: Extern octa, §87.

```
67. ⟨Commit and/or deissue up to commit_max instructions 67⟩ ≡
{
    for (m = commit_max; m > 0 ∧ deissues > 0; m--)
       ⟨Deissue the coolest instruction 145⟩;
    for (; m > 0; m--) {
        if (hot ≡ cool) break; /* reorder buffer is empty */
        if (¬security_disabled) ⟨Check for security violation, break if so 149⟩;
        if (hot¬owner) break; /* hot seat instruction isn't finished */
        ⟨Commit the hottest instruction, or break if it's not ready 146⟩;
        i = hot¬i;
        if (hot ≡ reorder_bot) hot = reorder_top;
        else hot --;
        if (i ≡ resum) break; /* allow the resumed instruction to see the new rK */
    }
}
```

This code is used in section 64.

**68.** The dispatch stage. It would be nice to present the parts of this simulator by dealing with the fetching, dispatching, executing, and committing stages in that order. After all, instructions are first fetched, then dispatched, then executed, and finally committed. However, the fetch stage depends heavily on difficult questions of memory management that are best deferred until we have looked at the simpler parts of simulation. Therefore we will take our initial plunge into the details of this program by looking first at the dispatch phase, assuming that instructions have somehow appeared magically in the fetch buffer.

The fetch buffer, like the circular priority queue of all coroutines and the circular queue used for the reorder buffer, lives in an array that is best regarded as a ring of elements. The elements are structures of type **fetch**, which have five fields: A 32-bit *inst*, which is an MMIX instruction; a 64-bit *loc*, which is the virtual address of that instruction; an *interrupt* field, which is nonzero if, for example, the protection bits in the relevant page table entry for this address do not permit execution access; a boolean *noted* field, which becomes *true* after the dispatch unit has peeked at the instruction to see whether it is a jump or probable branch; and a *hist* field, which records the recent branch history. (The least significant bits of *hist* correspond to the most recent branches.)

```
⟨ Type definitions 11 ⟩ +≡
typedef struct {
   octa loc;  /* virtual address of instruction */
   tetra inst;  /* the instruction itself */
   unsigned int interrupt;  /* bit codes that might cause interruption */
   bool noted;  /* have we peeked at this instruction? */
   unsigned int hist;  /* if we peeked, this was the peek_hist */
} fetch;
```

**69.** The oldest and youngest entries in the fetch buffer are pointed to by head and tail, just as the oldest and youngest entries in the reorder buffer are called hot and cool. The fetch coroutine will be adding entries at the tail position, which starts at  $old\_tail$  when a cycle begins, in parallel with the actions simulated by the dispatcher. Therefore the dispatcher is allowed to look only at instructions in head, head-1, ...,  $old\_tail+1$ , although a few more recently fetched instructions will usually be present in the fetch buffer by the time this part of the program is executed.

```
⟨External variables 4⟩ +≡
Extern fetch *fetch_bot, *fetch_top;
/* least and greatest entries in the ring containing the fetch buffer */
Extern fetch *head, *tail; /* front and rear of the fetch buffer */
```

```
bool = enum, \S 11.
                                   i: register int, §12.
                                                                       reorder_bot: control *, §60.
commit_max: int, \S 59.
                                   m: register int, §12.
                                                                       reorder_top: control *, §60.
cool: control *, §60.
                                   octa = struct, \S17.
                                                                       resum = 89, \S 49.
                                                                       security_disabled: bool, §66.
deissues: int, §60.
                                   old\_tail: fetch *, §70.
Extern = macro, \S 4.
                                   owner: coroutine *, §44.
                                                                       tetra = unsigned int, §17.
hot: \mathbf{control} *, \S 60.
                                   peek_hist: unsigned int, §99.
                                                                       true = 1, \S 11.
i: internal_opcode, §44.
```

```
70.
       \langle Global variables 20\rangle +\equiv
   fetch *old_tail:
                            /* rear of the fetch buffer available on the current cycle */
        #define UNKNOWN_SPEC ((specnode *) 1)
\langle Initialize everything 22\rangle + \equiv
   head = tail = fetch\_top:
   inst\_ptr.p = UNKNOWN\_SPEC;
        \langle \text{Internal prototypes } 13 \rangle + \equiv
   static void print_fetch_buffer ARGS((void));
       \langle Subroutines 14\rangle + \equiv
   static void print_fetch_buffer()
      printf("Fetch_buffer");
      if (head \equiv tail) \ printf("_{\perp}(empty)\n");
      else { register fetch *p;
         if (resuming) printf("[(resumption[state]%d)", resuming);
         printf (":\n");
         for (p = head; p \neq tail; p = (p \equiv fetch\_bot ? fetch\_top : p - 1)) {
            print\_octa(p \rightarrow loc);
            printf(": \_\%08x(\%s)", p \rightarrow inst, opcode\_name[p \rightarrow inst \gg 24]);
            if (p \rightarrow interrupt) print\_bits(p \rightarrow interrupt);
            if (p \rightarrow noted) printf ("*");
            printf("\n");
         }
      }
      printf("Instruction_pointer_is_");
      if (inst\_ptr.p \equiv \Lambda) print_octa(inst\_ptr.o);
      else {
         printf("waiting_for_");
         if (inst\_ptr.p \equiv UNKNOWN\_SPEC) printf("dispatch");
         else if (inst\_ptr.p \rightarrow addr.h \equiv (\mathbf{tetra}) - 1)
            print\_coroutine\_id(((\mathbf{control} *) inst\_ptr.p \rightarrow up) \rightarrow owner);
         else print\_specnode\_id(inst\_ptr.p \rightarrow addr);
      printf("\n");
```

**74.** The best way to understand the dispatching process is once again to "think big," by imagining a huge fetch buffer and the potential ability to issue dozens of instructions per cycle, although the actual numbers are typically quite small.

If the fetch buffer is not empty after dispatch—max instructions have been dispatched, the dispatcher also looks at up to peekahead further instructions to see if they are jumps or other commands that change the flow of control. Much of this action would happen in parallel on a real machine, but our simulator works sequentially.

In the following program, *true\_head* records the head of the fetch buffer as instructions are actually dispatched, while *head* refers to the position currently being examined (possibly peeking into the future).

If the fetch buffer is empty at the beginning of the current clock cycle, a "dispatch bypass" allows the dispatcher to issue the first instruction that enters the fetch buffer on this cycle. Otherwise the dispatcher is restricted to previously fetched instructions.

```
 \begin{tabular}{ll} $\langle \mbox{ Dispatch one cycle's worth of instructions 74} \rangle \equiv \\ & \{ \mbox{ register fetch } *true\_head, *new\_head; \\ & true\_head = head; \\ & \mbox{ if } (head \equiv head; \\ & \mbox{ if } (head \equiv old\_tail \land head \neq tail) \ old\_tail = (head \equiv fetch\_bot ? fetch\_top : head - 1); \\ & peek\_hist = cool\_hist; \\ & \mbox{ for } (j=0; \ j < dispatch\_max + peekahead; \ j++) \\ & & \mbox{ $\langle $ Look $ at the head $ instruction, and try to dispatch it if $j < dispatch\_max $75$ $\rangle; \\ & head = true\_head; \\ & \mbox{ } \} \\ \end{tabular}
```

This code is used in section 64.

```
addr: octa, §40.

ARGS = macro, §6.
control = struct, §44.
cool_hist: unsigned int, §99.
dispatch_max: int, §59.
fetch = struct, §68.
fetch_bot: fetch *, §69.
fetch_top: fetch *, §69.
h: tetra, §17.
head: fetch *, §69.
inst: tetra, §68.
inst_ptr: spec, §284.
```

```
interrupt: unsigned int, §68.
j: register int, §12.
loc: octa, §68.
noted: bool, §68.
o: octa, §40.
opcode_name: char *[], §48.
ouner: coroutine *, §44.
p: specnode *, §40.
peek_hist: unsigned int, §99.
peekahead: int, §59.
print_bits: static void (), §56.
```

```
print_coroutine_id: static
void (), §25.
print_octa: static void (), §19.
print_specnode_id: static void
(), §91.
printf: int (), <stdio.h>.
resuming: int, §78.
specnode = struct, §40.
tail: fetch *, §69.
tetra = unsigned int, §17.
up: specnode *, §40.
```

```
75.
      (Look at the head instruction, and try to dispatch it if i < dispatch\_max 75) \equiv
     register mmix_opcode op;
     register int uz. f:
     register bool freeze_dispatch = false;
     register func *u = \Lambda;
     if (head \equiv old\_tail) break;
                                        /* fetch buffer empty */
     if (head \equiv fetch\_bot) new\_head = fetch\_top; else new\_head = head - 1;
     op = head \rightarrow inst \gg 24; yz = head \rightarrow inst \& \# fffff;
     \langle Determine the flags, f, and the internal opcode, i 80\rangle;
     (Install default fields in the cool block 100);
     if (f \& rel\_addr\_bit) (Convert relative address to absolute address 84);
     if (head \neg noted) peek\_hist = head \neg hist;
     else (Redirect the fetch if control changes at this inst 85);
     if (j \ge dispatch\_max \lor dispatch\_lock \lor nullifying) {
        head = new\_head; continue;
                                            /* can't dispatch, but can peek ahead */
     if (cool \equiv reorder\_bot) new\_cool = reorder\_top; else new\_cool = cool - 1;
     (Dispatch an instruction to the cool block if possible, otherwise goto stall 101):
     Assign a functional unit if available, otherwise goto stall 82);
     (Check for sufficient rename registers and memory slots, or goto stall 111);
     if ((op \& \#e0) \equiv \#40) (Record the result of branch prediction 152);
     \langle \text{ Issue the } cool \text{ instruction } 81 \rangle;
     cool = new\_cool; cool\_O = new\_O; cool\_S = new\_S;
     cool\_hist = peek\_hist; continue;
  stall: (Undo data structures set prematurely in the cool block and break 123);
```

This code is used in section 74.

**76.** An instruction can be dispatched only if a functional unit is available to handle it. A functional unit consists of a 256-bit vector that specifies a subset of MMIX's opcodes, and an array of coroutines for the pipeline stages. There are k coroutines in the array, where k is the maximum number of stages needed by any of the opcodes supported.

```
⟨Type definitions 11⟩ +≡
typedef struct func_struct {
   char name[16]; /* symbolic designation */
   tetra ops[8]; /* big-endian bitmap for the opcodes supported */
   int k; /* number of pipeline stages */
   coroutine *co; /* pointer to the first of k consecutive coroutines */
} func;

77. ⟨External variables 4⟩ +≡
Extern func *funit; /* pointer to array of functional units */
Extern int funit_count; /* the number of functional units */
```

**78.** It is convenient to have a 256-bit vector of all the supported opcodes, because we need to shut off a lot of special actions when an opcode is not supported.

```
\langle Global variables 20\rangle + \equiv
                             /* the reorder position following cool */
  control *new_cool;
                        /* set nonzero if resuming an interrupted instruction */
  int resuming:
                           /* big-endian bitmap for all opcodes supported */
  tetra support [8];
79. \langle Initialize everything 22\rangle + \equiv
  { register func *u;
     for (u = funit; u \leq funit + funit\_count; u++)
        for (i = 0; i < 8; i++) \ support[i] = u \rightarrow ops[i];
  }
80.
       #define sign_bit ((unsigned) #80000000)
\langle Determine the flags, f, and the internal opcode, i \otimes i \otimes j = i
  if (\neg(support[op \gg 5] \& (sign\_bit \gg (op \& 31)))) {
        /* oops, this opcode isn't supported by any functional unit */
     f = flags[TRAP], i = trap;
  } else f = flags[op], i = internal\_op[op];
  if (i \equiv trip \land (head \neg loc.h \& sign\_bit)) f = 0, i = noop;
This code is used in section 75.
```

```
bool = enum, §11.
control = struct, §44.
cool: control *, §60.
cool_hist: unsigned int, §99.
cool_O: octa, §98.
cool_S: octa, §98.
coroutine = struct, §23.
dispatch_lock: lockvar, §65.
dispatch_max: int, §59.
Extern = macro, §4.
false = 0, §11.
fetch_bot: fetch *, §69.
fetch_top: fetch *, §69.
flags: unsigned char [], §83.
```

```
h: tetra, §17.
head: fetch *, §69.
hist: unsigned int, §68.
i: register int, §12.
i: register int, §10.
inst: tetra, §68.
internal_op: internal_opcode
[], §51.
j: register int, §12.
loc: octa, §68.
mmix_opcode = enum, §47.
new_head: register fetch *,
§74.
new_O: octa, §99.
```

new\_S: octa, §99.
noop = 81, §49.
noted: bool, §68.
nullifying: bool, §315.
old\_tail: fetch \*, §70.
peek\_hist: unsigned int, §99.
rel\_addr\_bit = #40, §83.
reorder\_bot: control \*, §60.
reorder\_top: control \*, §60.
tetra = unsigned int, §17.
trap = 82, §49.
TRAP = #00, §47.
trip = 83, §49.

```
81. (Issue the cool instruction 81) \equiv
   if (cool→interim) {
      cool \neg usage = false;
      if (cool \neg op \equiv SAVE) (Get ready for the next step of SAVE 341)
      else if (cool \neg op \equiv UNSAVE) (Get ready for the next step of UNSAVE 335)
      else if (cool \rightarrow i \equiv preld \lor cool \rightarrow i \equiv prest)
         (Get ready for the next step of PRELD or PREST 228)
      else if (cool \rightarrow i \equiv prego) (Get ready for the next step of PREGO 229)
   else if (cool \rightarrow i < max\_real\_command) {
      if ((flags[cool \rightarrow op] \& ctl\_change\_bit) \lor cool \rightarrow i \equiv pbr)
         if (inst\_ptr.p \equiv \Lambda \land (inst\_ptr.o.h \& sign\_bit) \land \neg (cool \neg loc.h \& sign\_bit) \land cool \neg i \neq trap)
            cool→interrupt |= P_BIT; /* jumping from nonnegative to negative */
      true\_head = head = new\_head:
                                              /* delete instruction from fetch buffer */
      resuming = 0;
   if (freeze\_dispatch) set\_lock(u \rightarrow co, dispatch\_lock);
   cool \neg owner = u \neg co; u \neg co \neg ctl = cool;
   startup(u \rightarrow co, 1);
                            /* schedule execution of the new inst */
   if (verbose & issue_bit) {
      printf("Issuing<sub>□</sub>"); print_control_block(cool);
      printf("_{\perp \perp}"); print\_coroutine\_id(u \rightarrow co); printf(" \n");
   dispatch\_count ++;
This code is used in section 75.
```

**82.** We assign the first functional unit that supports *op* and is totally unoccupied, if possible; otherwise we assign the first functional unit that supports *op* and has stage 1 unoccupied.

```
\langle Assign a functional unit if available, otherwise goto stall 82\rangle \equiv
   { register int t = op \gg 5, b = sign\_bit \gg (op \& 31);
     if (cool \rightarrow i \equiv trap \land op \neq TRAP) {
                                                 /* opcode needs to be emulated */
         u = funit + funit\_count; /* this unit supports just TRIP and TRAP */
         goto unit_found;
     for (u = funit; u \leq funit + funit\_count; u++)
         if (u \rightarrow ops[t] \& b) {
            for (i = 0; i < u \rightarrow k; i ++)
              if (u \rightarrow co[i].next) goto unit\_busy;
            goto unit_found;
         unit\_busy:;
     for (u = funit; u < funit + funit\_count; u++)
         if ((u \rightarrow ops[t] \& b) \land (u \rightarrow co \rightarrow next \equiv \Lambda)) goto unit\_found;
     goto stall;
                        /* all units for this op are busy */
   unit_found:
```

This code is used in section 75.

```
co: coroutine *, §76.
cool: control *, §60.
ctl: control *, §23.
ctl\_change\_bit = #80, \S83.
dispatch\_count: int, §65.
dispatch_lock: lockvar, §65.
false = 0, \S 11.
flags: unsigned char [], §83.
freeze_dispatch: register
  bool, §75.
funit: func *, §77.
funit_count: int, §77.
h: tetra, §17.
head: fetch *, §69.
i: internal_opcode, §44.
i: register int, §12.
inst\_ptr: \mathbf{spec}, \S 284.
interim: bool, §44.
interrupt: unsigned int, §44.
```

```
issue\_bit = 1 \ll 0, \S 8.
k: int, §76.
loc: octa, §44.
max\_real\_command = trip, \S 49.
new_head: register fetch *,
  §74.
next: coroutine *, §23.
o: octa, §40.
op: mmix_opcode, §44.
op: register mmix_opcode,
ops: tetra [], §76.
owner: coroutine *, §44.
p: specnode *, §40.
P_BIT = 1 \ll 0, \S 54.
pbr = 70, \S 49.
prego = 73, \S 49.
preld = 61, \S 49.
prest = 62, \S 49.
```

```
print_control_block: static
  void (), §46.
print_coroutine_id: static
  void (), §25.
printf: int (), <stdio.h>.
resuming: int, §78.
SAVE = \#fa, \S 47.
set\_lock = macro(), §37.
sign\_bit = macro, \S 80.
stall: label, §75.
startup: static void (), §31.
trap = 82, \S 49.
TRAP = ^{\#}00, §47.
true_head: register fetch *,
  §74.
u: register func *, §75.
UNSAVE = \#fb, \S47.
usage: bool, §44.
verbose: int, §4.
```

83. The *flags* table records special properties of each operation code in binary notation: #1 means Z is an immediate value, #2 means rZ is a source operand, #4 means Y is an immediate value, #8 means rY is a source operand, #10 means rX is a source operand, #20 means rX is a destination, #40 means YZ is part of a relative address, #80 means the control changes at this point.

```
#define X_is_dest_bit #20
#define rel_addr_bit #40
#define ctl_change_bit #80
\langle Global variables 20\rangle + \equiv
       unsigned char flags[256] = \{ \text{\#8a}, \text{\#2a}, \text{\#2a}, \text{\#2a}, \text{\#2a}, \text{\#26}, \text{\#2a}, \text{\#26}, \text{\#26},
                                                                                                                                                                                                            /* TRAP, ... */
       #26, #25, #26, #25, #26, #25, #26, #25,
                                                                                                                            /* FLOT, ... */
       #2a, #2a, #2a, #2a, #2a, #26, #2a, #26,
                                                                                                                            /* FMUL, ... */
       #2a, #29, #2a, #29, #2a, #29, #2a, #29,
                                                                                                                            /* MUL, ... */
       #2a, #29, #2a, #29, #2a, #29, #2a, #29,
                                                                                                                           /* ADD, ... */
       #2a, #29, #2a, #29, #2a, #29, #2a, #29,
                                                                                                                            /* 2ADDU, ... */
      *2a, *29, *2a, *29, *26, *25, *26, *25,
                                                                                                                            /* CMP, ... */
       #2a, #29, #2a, #29, #2a, #29, #2a, #29,
                                                                                                                            /* SL, ... */
      #50, #50, #50, #50, #50, #50, #50, #50,
                                                                                                                           /* BN, ... */
       #50, #50, #50, #50, #50, #50, #50, #50,
                                                                                                                            /* BNN, ... */
      #50, #50, #50, #50, #50, #50, #50, #50,
                                                                                                                            /* PBN.... */
      #50, #50, #50, #50, #50, #50, #50, #50,
                                                                                                                            /* PBNN, ... */
      #3a, #39, #3a, #39, #3a, #39, #3a, #39,
                                                                                                                           /* CSN, ... */
       #3a, #39, #3a, #39, #3a, #39, #3a, #39,
                                                                                                                            /* CSNN, ... */
      #2a, #29, #2a, #29, #2a, #29, #2a, #29,
                                                                                                                            /* ZSN, ... */
       #2a, #29, #2a, #29, #2a, #29, #2a, #29,
                                                                                                                            /* ZSNN.... */
       #2a, #29, #2a, #29, #2a, #29, #2a, #29,
                                                                                                                            /* LDB, ... */
       #2a, #29, #2a, #29, #2a, #29, #2a, #29,
                                                                                                                            /* LDT, ... */
       #2a, #29, #2a, #29, #3a, #39, #2a, #29,
                                                                                                                            /* LDSF, ... */
       #2a, #29, #0a, #09, #0a, #09, #aa, #a9,
                                                                                                                            /* LDVTS, ... */
       #1a, #19, #1a, #19, #1a, #19, #1a, #19,
                                                                                                                            /* STB, ... */
       #1a, #19, #1a, #19, #1a, #19, #1a, #19,
                                                                                                                            /* STT, ... */
       #1a, #19, #1a, #19, #0a, #09, #1a, #19,
                                                                                                                            /* STSF, ... */
                                                                                                                            /* SYNCD, ... */
      #0a, #09, #0a, #09, #0a, #09, #aa, #a9,
       #2a, #29, #2a, #29, #2a, #29, #2a, #29,
                                                                                                                            /* OR, ... */
      *2a, *29, *2a, *29, *2a, *29, *2a, *29,
                                                                                                                            /* AND, ... */
                                                                                                                            /* BDIF, ... */
       #2a, #29, #2a, #29, #2a, #29, #2a, #29,
      #2a, #29, #2a, #29, #2a, #29, #2a, #29,
                                                                                                                            /* MUX, ... */
      #20, #20, #20, #20, #30, #30, #30, #30,
                                                                                                                            /* SETH, ... */
      #30, #30, #30, #30, #30, #30, #30,
                                                                                                                            /* ORH, ... */
       #c0, #c0, #e0, #e0, #60, #60, #02, #01,
                                                                                                                            /* JMP, ... */
       #80, #80, #00, #02, #01, #00, #20, #8a};
                                                                                                                             /* POP, ... */
```

```
84. \langle Convert relative address to absolute address 84\rangle \equiv {
    if (i \equiv jmp) yz = head \neg inst \& \#fffffff;
    if (op \& 1) yz -= (i \equiv jmp ? \#1000000 : \#100000);
    cool \neg y.o = incr(head \neg loc, 4), cool \neg y.p = \Lambda;
    cool \neg z.o = incr(head \neg loc, yz \ll 2), cool \neg z.p = \Lambda;
}
```

This code is used in section 75.

85. The location of the next instruction to be fetched is in a **spec** variable called *inst\_ptr*. A slightly tricky optimization of the POP instruction is made in the common case that the speculative value of rJ is known.

```
\langle Redirect the fetch if control changes at this inst 85\rangle \equiv
  { register int predicted = 0;
     if ((op \& \#e0) \equiv \#40) \(\right\) Predict a branch outcome 151\(\right\);
     head \neg noted = true;
     head \rightarrow hist = peek\_hist;
     if (predicted \lor (f \& ctl\_change\_bit) \lor (i \equiv syncid \land \neg (cool \neg loc.h \& sign\_bit))) {
                                          /* discard all remaining fetches */
        old\_tail = tail = new\_head;
        ⟨ Restart the fetch coroutine 287⟩;
        switch (i) {
        case jmp: case br: case pbr: case pushj: inst\_ptr = cool \neg z; break;
        case pop: if (g[rJ].up \rightarrow known \land j < dispatch\_max \land \neg dispatch\_lock \land \neg nullifying) {
              inst\_ptr.o = incr(q[rJ].up \neg o, yz \ll 2), inst\_ptr.p = \Lambda; break;
                  /* otherwise fall through, will wait on cool→go */
        case qo: case pushqo: case trap: case resume: case syncid:
           inst\_ptr.p = UNKNOWN\_SPEC; break;
        case trip: inst\_ptr = zero\_spec; break;
```

This code is used in section 75.

```
br = 69, §49.
                                    jmp = 80, \S 49.
                                                                        pushgo = 74, \S 49.
cool: control *, §60.
                                    known: bool, \S 40.
                                                                        pushj = 71, \S 49.
dispatch_lock: lockvar, §65.
                                    loc: octa, §68.
                                                                        resume = 76, \S 49.
                                    loc: octa, \S 44.
dispatch\_max: int, §59.
                                                                        rJ = 4, \S 52.
f: register int, §75.
                                    new_head: register fetch *,
                                                                        sign_bit = macro, \S 80.
                                                                        syncid = 65, §49.
g: specnode [], §86.
                                      ξ74.
go = 72, \S 49.
                                    noted: bool, §68.
                                                                         tail: fetch *, §69.
go: specnode, §44.
                                    nullifying: bool, §315.
                                                                        trap = 82, \S 49.
                                                                        trip = 83, \S 49.
h: tetra, §17.
                                    o: octa, §40.
                                    old_tail: \mathbf{fetch} *, \S 70.
head: fetch *, §69.
                                                                        true = 1, \S 11.
hist: unsigned int, §68.
                                    op: register mmix_opcode,
                                                                        UNKNOWN\_SPEC = macro, §71.
i: register int, §12.
                                                                        up: specnode *, §40.
incr: octa (), MMIX-ARITH §6.
                                    p: specnode *, §40.
                                                                        y: spec, §44.
                                                                        yz: register int, §75.
inst: tetra, §68.
                                    pbr = 70, \S 49.
inst\_ptr: \mathbf{spec}, \S 284.
                                    peek_hist: unsigned int, §99.
                                                                        z: spec, \S44.
j: register int, §12.
                                    pop = 75, \S 49.
                                                                        zero_spec: spec, §41.
```

86. At any given time the simulated machine is in two main states, the "hot state" corresponding to instructions that have been committed and the "cool state" corresponding to all the speculative changes currently being considered. The dispatcher works with cool instructions and puts them into the reorder buffer, where they gradually get warmer and warmer. Intermediate instructions, between hot and cool, have intermediate temperatures.

A machine register like l[101] or g[250] is represented by a specnode whose o field is the current hot value of the register. If the up and down fields of this specnode point to the node itself, the hot and cool values of the register are identical. Otherwise up and down are pointers to the coolest and hottest ends of a doubly linked list of specnodes, representing intermediate speculative values (sometimes called "rename registers"). The rename registers are implemented as the x or a specnodes inside control blocks, for speculative instructions that use this register as a destination. Speculative instructions that use the register as a source operand point to the nexthottest specnode on the list, until the value becomes known. The doubly linked list of specnodes is an input-restricted deque: A node is inserted at the cool end when the dispatcher issues an instruction with this register as destination; a node is removed from the cool end if an instruction needs to be deissued; a node is removed from the hot end when an instruction is committed.

The special registers rA, rB, ... occupy the same array as the global registers g[32], g[33], ... . For example, rB is internally the same as g[0], because rB = 0.

```
⟨ External variables 4⟩ +≡
Extern specnode g[256]; /* global registers and special registers */
Extern specnode *l; /* the ring of local registers */
Extern int lring_size;
/* the number of on-chip local registers (must be a power of 2) */
Extern int max_rename_regs, max_mem_slots; /* capacity of reorder buffer */
Extern int rename_regs, mem_slots; /* currently unused capacity */
```

87. Special register rC was the clock in the original definition of MMIX. But now the clock is just an external variable, called ticks.

```
⟨External variables 4⟩ +≡
Extern octa ticks; /* the internal clock */
88. ⟨Global variables 20⟩ +≡
int lring_mask; /* for calculations modulo lring_size */
```

89. The addr fields in the specnode lists for registers are used to identify that register in diagnostic messages. Such addresses are negative; memory addresses are positive.

All registers are initially zero except rG, which is initially 255, and rN, which has a constant value identifying the time of compilation. (The macro ABSTIME is defined externally in the file abstime.h, which should have just been created by ABSTIME; ABSTIME is a trivial program that computes the value of the standard library function  $time(\Lambda)$ . We assume that this number, which is the number of seconds in the "UNIX epoch," is less than  $2^{32}$ . Beware: Our assumption will fail in February of 2106.)

```
#define VERSION 1 /* version of the MMIX architecture that we support */
```

```
#define SUBVERSION 0
                                 /* secondary byte of version number */
                                    /* further qualification to version number */
\#define SUBSUBVERSION 0
\langle \text{Initialize everything } 22 \rangle + \equiv
  rename\_reas = max\_rename\_reas:
  mem\_slots = max\_mem\_slots;
  lring\_mask = lring\_size - 1;
  for (j = 0; j < 256; j++) {
     g[j].addr.h = sign\_bit, g[j].addr.l = j, g[j].known = true;
     q[j].up = q[j].down = &q[j];
  q[rG].o.l = 255;
  g[rN].o.h = (\text{VERSION} \ll 24) + (\text{SUBVERSION} \ll 16) + (\text{SUBSUBVERSION} \ll 8);
  g[rN].o.l = ABSTIME;
                             /* see comment and warning above */
  for (j = 0; j < lring\_size; j ++) {
     l[j].addr.h = sign\_bit, l[j].addr.l = 256 + j, l[j].known = true;
     l[j].up = l[j].down = \&l[j];
  }
     \langle \text{Internal prototypes } 13 \rangle + \equiv
  static void print_specnode_id ARGS((octa));
      \langle Subroutines 14\rangle + \equiv
  static void print_specnode_id(a)
        octa a:
  {
     if (a.h \equiv sign\_bit) {
        if (a.l < 32) printf (special\_name[a.l]);
        else if (a.l < 256) printf ("g[%d]", a.l);
        else printf("1[\%d]", a.l - 256);
     } else if (a.h \neq (tetra) -1) {
        printf("m["); print_octa(a); printf("]");
  }
```

```
j: register int, §10.
a: specnode, §44.
                                                                       rN = 9, §52.
ABSTIME = macro, abstime.h.
                                    known: bool, §40.
                                                                        sign\_bit = macro, \S 80.
                                                                       special\_name: char *[], §53.
addr: \mathbf{octa}, \S 40.
                                   l: tetra, §17.
ARGS = macro, \S 6.
                                   o: octa, §40.
                                                                       specnode = struct, \S 40.
cool: control *, §60.
                                   octa = struct, \S 17.
                                                                       tetra = unsigned int, §17.
down: specnode *, §40.
                                    print_octa: static void (), §19. time: time_t (), <time.h>.
Extern = macro, \S 4.
                                   printf: int (), <stdio.h>.
                                                                       true = 1, \S 11.
h: tetra, §17.
                                   rB = 0, \S 52.
                                                                       up: specnode *, §40.
hot: \mathbf{control} *, \S 60.
                                   rG = 19, \S 52.
                                                                       x: specnode, §44.
```

**92.** The *specval* subroutine produces a **spec** corresponding to the currently coolest value of a given local or global register.

```
\langle Internal prototypes 13 \rangle + \equiv
   static spec specval ARGS((specnode *));
93.
        \langle Subroutines 14\rangle + \equiv
   static spec specval(r)
         specnode *r;
   { spec res;
      if (r \rightarrow up \rightarrow known) res. o = r \rightarrow up \rightarrow o, res. p = \Lambda;
      else res.p = r \rightarrow up;
      return res;
   }
        The spec_install subroutine introduces a new speculative value at the cool end
of a given doubly linked list.
\langle Internal prototypes 13 \rangle + \equiv
   static void spec_install ARGS((specnode *, specnode *));
       \langle Subroutines 14\rangle + \equiv
   static void spec\_install(r,t)
                                             /* insert t into list r */
         specnode *r, *t;
      t \rightarrow up = r \rightarrow up;
      t \rightarrow up \rightarrow down = t;
      r \rightarrow up = t;
      t \rightarrow down = r;
      t \rightarrow addr = r \rightarrow addr;
   }
96.
        Conversely, spec_rem takes such a value out.
\langle Internal prototypes 13 \rangle + \equiv
   static void spec_rem ARGS((specnode *));
        \langle Subroutines 14\rangle + \equiv
                                         /* remove t from its list */
   static void spec\_rem(t)
         specnode *t:
   { register specnode *u = t \rightarrow up, *d = t \rightarrow down;
      u \rightarrow down = d; d \rightarrow up = u;
   }
```

98. Some special registers are so central to MMIX's operation, they are carried along with each control block in the reorder buffer instead of being treated as source and destination registers of each instruction. For example, the register stack pointers rO and rS are treated in this way. The normal specnodes for rO and rS, namely g[rO] and g[rS], are not actually used; the cool values are called  $cool_O$  and  $cool_S$ . (Actually  $cool_O$  and  $cool_S$  correspond to the register values divided by 8, since rO and rS are always multiples of 8.)

The arithmetic status register, rA, is also treated specially. Its event bits are kept up to date only at the "hot" end, by accumulating values of *arith\_exc*; an instruction to GET the value of rA will be executed only in the hot seat. The other bits of rA, which are needed to control trip handlers and floating point rounding, are treated in the normal way.

```
⟨ External variables 4⟩ +≡
Extern octa cool_O, cool_S; /* values of rO, rS before the cool instruction */
99. ⟨Global variables 20⟩ +≡
int cool_L, cool_G; /* values of rL and rG before the cool instruction */
unsigned int cool_hist, peek_hist; /* history bits for branch prediction */
octa new_O, new_S; /* values of rO, rS after cool */
```

```
addr: octa, §40.

ARGS = macro, §6.

arith_exc: unsigned int, §44.

cool: control *, §60.

down: specnode *, §40.

Extern = macro, §4.
```

g: specnode [], §86. known: bool, §40. o: octa, §40. octa = struct, §17. p: specnode \*, §40.  $rO = 10, \S 52.$   $rS = 11, \S 52.$   $spec = struct, \S 40.$   $specnode = struct, \S 40.$  $up: specnode *, \S 40.$ 

```
100. (Install default fields in the cool block 100) \equiv
   cool \rightarrow op = op; cool \rightarrow i = i;
   cool \rightarrow xx = (head \rightarrow inst \gg 16) \& \# ff; cool \rightarrow yy = (head \rightarrow inst \gg 8) \& \# ff;
   cool \neg zz = (head \neg inst) \& \#ff:
   cool \neg loc = head \neg loc;
   cool \rightarrow y = cool \rightarrow z = cool \rightarrow b = cool \rightarrow ra = zero\_spec;
   cool \rightarrow x.o = cool \rightarrow a.o = cool \rightarrow rl.o = zero\_octa;
   cool \neg x.known = false;
   cool \neg x.up = \Lambda;
   cool \neg a.known = false;
   cool \neg a.up = \Lambda;
   cool \rightarrow rl.known = true;
   cool \neg rl.up = \Lambda;
   cool \neg need\_b = cool \neg need\_ra = cool \neg ren\_x = cool \neg ren\_x = cool \neg ren\_a = cool \neg set\_l = false;
   cool \neg arith\_exc = cool \neg denin = cool \neg denout = 0;
   if ((head \neg loc.h \& sign\_bit) \land \neg (g[rU].o.h \& \#8000)) cool \neg usage = false;
   else cool \neg usage = ((op \& (g[rU].o.h \gg 16)) \equiv g[rU].o.h \gg 24 ? true : false);
   new_O = cool \neg cur_O = cool_O; new_S = cool \neg cur_S = cool_S;
   cool \rightarrow interrupt = head \rightarrow interrupt;
   cool \rightarrow hist = peek\_hist;
   cool \neg qo.o = incr(cool \neg loc, 4);
   cool \neg qo.known = false, cool \neg qo.addr.h = -1, cool \neg qo.up = (specnode *) cool;
   cool \neg interim = cool \neg stack\_alert = false;
This code is used in section 75.
          \langle Dispatch an instruction to the cool block if possible, otherwise goto stall 101 \rangle
   if (new\_cool \equiv hot) goto stall;
                                                  /* reorder buffer is full */
   \langle \text{ Make sure } cool\_L \text{ and } cool\_G \text{ are up to date } 102 \rangle;
   (Install the operand fields of the cool block 103);
   if (f \& X_i = dest_b it) \land Install register X as the destination, or insert an internal
            command and goto dispatch_done if X is marginal 110);
   \mathbf{switch} (i) {
      (Special cases of instruction dispatch 117)
   default: break;
dispatch\_done:
This code is used in section 75.
          The UNSAVE operation begins by loading register rG from memory. We don't
really need to know the value of rG until twelve other registers have been unsaved, so
we aren't fussy about it here.
\langle \text{ Make sure } cool\_L \text{ and } cool\_G \text{ are up to date } 102 \rangle \equiv
   if (\neg g[rL].up \rightarrow known) goto stall;
   cool\_L = q[rL].up \rightarrow o.l;
   if (\neg g[rG].up \neg known \land \neg (op \equiv UNSAVE \land cool \neg xx \equiv 1)) goto stall;
   cool\_G = g[rG].up \neg o.l;
This code is used in section 101.
```

yy: unsigned char, §44.

zz: unsigned char, §44.

z: spec,  $\S44$ .

zero\_octa: octa,

MMIX-ARITH §4. zero\_spec: spec, §41.

hist: unsigned int, §44.

i: internal\_opcode, §44.

incr: octa (), MMIX-ARITH §6.

 $hot: \mathbf{control} *, \S 60.$ 

i: register int, §12.

inst: **tetra**, §68.

interim: bool, §44.

```
if (resuming) (Insert special operands when resuming an interrupted operation 324)
   else {
      if (f \& #10) \langle \text{Set } cool \rightarrow b \text{ from register X } 106 \rangle
      if (third\_operand[op] \land (cool \neg i \neq trap))
         \langle \text{ Set } cool \rightarrow b \text{ and/or } cool \rightarrow ra \text{ from special register } 108 \rangle;
      if (f \& #1) cool \rightarrow z.o.l = cool \rightarrow zz;
      else if (f \& ^{\#}2) (Set cool \neg z from register Z 104)
      else if ((op \& #f0) \equiv #e0) \langle Set cool \neg z \text{ as an immediate wyde } 109 \rangle;
      if (f \& #4) cool \rightarrow y.o.l = cool \rightarrow yy;
      else if (f \& \#8) (Set cool \neg y from register Y 105)
This code is used in section 101.
104. \langle \text{Set } cool \neg z \text{ from register Z } 104 \rangle \equiv
      if (cool \neg zz > cool \neg G) cool \neg z = specval(\&q[cool \neg zz]);
      else if (cool \neg zz < cool \bot L) cool \neg z = specval(\&l[(cool \bot O.l + cool \neg zz) \& lring\_mask]);
This code is used in section 103.
105. \langle \text{Set } cool \rightarrow y \text{ from register Y } 105 \rangle \equiv
      if (cool \neg yy > cool \neg G) cool \neg y = specval(\&g[cool \neg yy]);
      else if (cool \rightarrow yy < cool \rightarrow L) cool \rightarrow y = specval(\&l[(cool \rightarrow O.l + cool \rightarrow yy) \& lring\_mask]);
This code is used in section 103.
a: specnode, §44.
                                        interrupt: unsigned int, §68.
                                                                                rL = 20, \S 52.
                                                                                rU = 17, \S 52.
addr: octa, §40.
                                        interrupt: unsigned int, §44.
arith_exc: unsigned int, §44.
                                        known: bool, \S 40.
                                                                                set_l: bool, §44.
b: spec, §44.
                                        l: tetra, §17.
                                                                                sign\_bit = macro, \S 80.
cool: control *, §60.
                                        l: specnode *, §86.
                                                                                specnode = struct, \S 40.
cool\_G: int, §99.
                                        loc: octa, §44.
                                                                                specval: static spec (), §93.
                                        loc: octa, §68.
                                                                                stack_alert: bool, §44.
cool_L: int, §99.
                                                                                stall: label, §75.
cool_O: octa, §98.
                                        lring\_mask: int, \S 88.
cool\_S: octa, §98.
                                       mem_x: bool, §44.
                                                                                third_operand: unsigned char
cur_O: octa, §44.
                                       need\_b: bool, §44.
                                                                                  [], §107.
cur_S: octa, §44.
                                       need\_ra: bool, \S 44.
                                                                                trap = 82, \S 49.
denin: int, §44.
                                       new\_cool: control *, §78.
                                                                                true = 1, \S 11.
denout: int, §44.
                                       new\_O: octa, §99.
                                                                                UNSAVE = \#fb, \S47.
f: register int, §75.
                                       new\_S: octa, §99.
                                                                                up: specnode *, §40.
false = 0, §11.
                                       o: octa, §40.
                                                                                usage: bool, §44.
g: specnode [], §86.
                                       op: register mmix_opcode,
                                                                                x: specnode, §44.
                                                                                X_i = dest_b = \#20, \S83.
go: specnode, §44.
                                          §75.
h: tetra, §17.
                                        op: mmix_opcode, §44.
                                                                                xx: unsigned char, §44.
head: fetch *, §69.
                                        peek_hist: unsigned int, §99.
                                                                                y: spec, §44.
```

 $ra: \mathbf{spec}, \S 44.$ 

 $rG = 19, \S 52.$ 

ren\_a: **bool**, §44.

*ren\_x*: **bool**, §44.

resuming: int, §78.

rl: specnode, §44.

103. (Install the operand fields of the *cool* block 103)  $\equiv$ 

This code is used in section 103.

```
106. \langle \text{Set } cool \neg b \text{ from register X } 106 \rangle \equiv \{ \\ \text{ if } (cool \neg xx \geq cool \neg G) \ cool \neg b = specval(\&g[cool \neg xx]); \\ \text{else if } (cool \neg xx < cool \neg L) \ cool \neg b = specval(\&l[(cool \neg O.l + cool \neg xx) \& lring \neg mask]); \\ \text{if } (f \& rel \neg addr \neg bit) \ cool \neg need \neg b = true; \ /* \ br, \ pbr \ */ \}
```

107. If an operation requires a special register as third operand, that register is listed in the *third\_operand* table.

```
\langle \text{Global variables } 20 \rangle + \equiv
   unsigned char third_operand [256] = \{
  0, rA, 0, 0, rA, rA, rA, rA, rA,
                                      /* TRAP, ... */
                                           /* FLOT, ... */
  rA, rA, rA, rA, rA, rA, rA, rA, rA,
                                            /* FMUL, ... */
   rA, rE, rE, rE, rA, rA, rA, rA,
  rA, rA, 0, 0, rA, rA, rD, rD,
                                        /* MUL, ... */
  rA, rA, 0, 0, rA, rA, 0, 0,
                                    /* ADD, ... */
  0, 0, 0, 0, 0, 0, 0, 0, 0,
                            /* 2ADDU, ... */
                                /* CMP, ... */
  0, 0, 0, 0, rA, rA, 0, 0,
                                /* SL, ... */
  0, 0, 0, 0, 0, 0, 0, 0, 0,
                            /* BN, ... */
                            /* BNN, ... */
  0, 0, 0, 0, 0, 0, 0, 0, 0,
                            /* PBN, ... */
  0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0,
                            /* PBNN, ... */
                            /* CSN, ... */
  0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0,
                            /* CSNN, ... */
                            /* ZSN, ... */
  0, 0, 0, 0, 0, 0, 0, 0, 0,
                            /* ZSNN, ... */
  0, 0, 0, 0, 0, 0, 0, 0, 0,
                            /* LDB, ... */
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
                            /* LDT, ... */
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0
                            /* LDSF, ... */
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
  0, 0, 0, 0, 0, 0, 0, 0, 0,
                            /* LDVTS, ... */
   rA, rA, 0, 0, rA, rA, 0, 0,
                                    /* STB, ... */
  rA, rA, 0, 0, 0, 0, 0, 0,
                                /* STT, ... */
                                /* STSF, ... */
  rA, rA, 0, 0, 0, 0, 0, 0,
                             /* SYNCD, ... */
  0, 0, 0, 0, 0, 0, 0, 0, 0,
  /* OR, ... */
  0, 0, 0, 0, 0, 0, 0, 0, 0,
                            /* AND, ... */
                            /* BDIF, ... */
  0, 0, 0, 0, 0, 0, 0, 0, 0,
                                  /* MUX, ... */
  rM, rM, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0,
                            /* SETH, ... */
  0, 0, 0, 0, 0, 0, 0, 0, 0,
                            /* ORH, ... */
  0, 0, 0, 0, 0, 0, 0, 0, 0,
                            /* JMP, ... */
   rJ, 0, 0, 0, 0, 0, 0, 255;
                                /* POP, ... */
```

**108.** The  $cool \neg b$  field is busy in operations like STB or STSF, which need rA. So we use  $cool \neg ra$  instead, when rA is needed.

```
\langle \text{ Set } cool \neg b \text{ and/or } cool \neg ra \text{ from special register } 108 \rangle \equiv
      if (third\_operand[op] \equiv rA \lor third\_operand[op] \equiv rE)
          cool \neg need\_ra = true, cool \neg ra = specval(\&g[rA]);
      if (third\_operand[op] \neq rA)
          cool \neg need\_b = true, cool \neg b = specval(\&q[third\_operand[op]]);
This code is used in section 103.
109. \langle \text{Set } cool \neg z \text{ as an immediate wyde } 109 \rangle \equiv
      switch (op & 3) {
      case 0: cool \rightarrow z.o.h = yz \ll 16; break;
      case 1: cool \rightarrow z.o.h = yz; break;
      case 2: cool \neg z.o.l = yz \ll 16; break;
      case 3: cool \neg z.o.l = yz; break;
      if (i \neq set) {
                             /* register X should also be the Y operand */
         cool \rightarrow y = cool \rightarrow b;
         cool \neg b = zero\_spec;
   }
```

This code is used in section 103.

```
rel_addr_bit = #40, §83.
b: spec, §44.
                                    lring\_mask: int, \S 88.
br = 69, \S 49.
                                    need\_b: bool, §44.
                                                                        rJ = 4, \S 52.
cool: control *, §60.
                                    need\_ra: bool, \S 44.
                                                                        rM = 5, \S 52.
                                                                        set = 33, \S 49.
cool\_G: int, §99.
                                    o: octa, §40.
                                                                        specval: static spec (), §93.
cool_L: int, §99.
                                    op: register mmix_opcode,
cool_O: octa, §98.
                                     §75.
                                                                        true = 1, \S 11.
f: register int, §75.
                                   pbr = 70, \S 49.
                                                                        xx: unsigned char, §44.
                                   rA = 21, \S 52.
g: specnode [], §86.
                                                                        y: spec, §44.
                                   ra: spec, §44.
                                                                        yz: register int, §75.
h: tetra, §17.
                                   rD = 1, \S 52.
i: register int, §12.
                                                                       z: spec, §44.
                                                                        zero\_spec: spec, \S 41.
l: specnode *, §86.
                                   rE = 2, \S 52.
l: tetra, §17.
```

```
110.
         (Install register X as the destination, or insert an internal command and goto
        dispatch\_done if X is marginal 110 \rangle \equiv
   {
     if (cool \rightarrow xx > cool G) {
        if (i \neq pushqo \land i \neq pushj \land i \neq cswap)
           cool \rightarrow ren\_x = true, spec\_install(\&q[cool \rightarrow xx], \&cool \rightarrow x);
      } else if (cool \rightarrow xx < cool L) {
        if (i \neq cswap)
           cool \neg ren\_x = true, spec\_install(\&l[(cool\_O.l + cool \neg xx) \& lring\_mask], \& cool \neg x);
      } else { /* we need to increase L before issuing head→inst */
      increase\_L: if (((cool\_S.l - cool\_O.l - cool\_L - 1) \& lring\_mask) \equiv 0)
           (Insert an instruction to advance gamma 113)
        else (Insert an instruction to advance beta and L 112);
This code is used in section 101.
111. (Check for sufficient rename registers and memory slots, or goto stall 111) \equiv
  if (rename\_regs < cool \neg ren\_x + cool \neg ren\_a) goto stall;
  if (cool \rightarrow mem\_x)
     if (mem_slots) mem_slots --; else goto stall;
   rename\_regs = cool \neg ren\_x + cool \neg ren\_a;
This code is used in section 75.
112. The incrl instruction advances \beta and rL by 1 at a time when we know that
\beta \neq \gamma, in the ring of local registers.
\langle Insert an instruction to advance beta and L 112\rangle \equiv
   {
     cool \rightarrow i = incrl:
     spec\_install(\&l[(cool\_O.l + cool\_L) \& lring\_mask], \& cool \neg x);
      cool \neg need\_b = cool \neg need\_ra = false;
      cool \neg y = cool \neg z = zero\_spec;
      cool \rightarrow x.known = true;
                                    /* cool \rightarrow x.o = zero\_octa */
      spec\_install(\&q[rL],\&cool \neg rl);
      cool \rightarrow rl.o.l = cool L + 1;
      cool \neg ren\_x = cool \neg set\_l = true;
                        /* this instruction to be handled by the simplest units */
      cool \neg interim = true;
     goto dispatch_done;
This code is used in section 110.
         The inequality instruction advances \gamma and rS by storing an octabyte from the
local register ring to virtual memory location cool\_S \ll 3.
\langle Insert an instruction to advance gamma 113\rangle \equiv
   {
     cool \neg need\_b = cool \neg need\_ra = false:
      cool \rightarrow i = incgamma;
     new\_S = incr(cool\_S, 1);
```

```
 \begin{array}{l} cool \neg b = specval(\&l[cool\_S.l \& lring\_mask]); \\ cool \neg y.p = \Lambda, cool \neg y.o = shift\_left(cool\_S,3); \\ cool \neg z = zero\_spec; \\ cool \neg mem\_x = true, spec\_install(\&mem, \&cool \neg x); \\ op = STOU; /* this instruction needs to be handled by load/store unit */cool \neg interim = true; \\ cool \neg stack\_alert = \neg(cool \neg y.o.h \& sign\_bit); \\ \textbf{goto} \ dispatch\_done; \\ \} \end{array}
```

This code is used in sections 110, 119, and 337.

```
b: spec, §44.
cool: control *, §60.
cool\_G: int, §99.
cool_L: int, §99.
cool_O: octa, §98.
cool\_S: octa, §98.
cswap = 68, §49.
dispatch_done: label, §101.
false = 0, §11.
q: specnode [], §86.
h: tetra, §17.
head: fetch *, §69.
i: register int, §12.
i: internal_opcode, §44.
incgamma = 84, \S 49.
incr: octa (), MMIX-ARITH §6.
incrl = 86, \S 49.
inst: tetra, §68.
interim: bool, §44.
known: bool, §40.
```

```
l: specnode *, §86.
l: tetra, §17.
lring\_mask: int, §88.
mem: specnode, §115.
mem\_slots: int, §86.
mem_x: bool, §44.
need_b: bool, §44.
need\_ra: bool, \S 44.
new\_S: octa, §99.
o: octa, §40.
op: register mmix_opcode,
  §75.
p: specnode *, §40.
pushqo = 74, \S 49.
pushj = 71, \S 49.
ren_a: bool, §44.
ren_x: bool, §44.
rename_regs: int, §86.
rl: specnode, §44.
rL = 20, \S 52.
```

```
set_l: bool, §44.
SETH = \#e0, \S47.
shift_left: octa (),
  MMIX-ARITH §7.
sign\_bit = macro, \S 80.
spec_install: static void (),
   §95.
specval: static spec (), §93.
 stack\_alert: bool, §44.
stall: label, §75.
STOU = {}^{\#}ae, \S 47.
true = 1, \S 11.
x: specnode, §44.
xx: unsigned char, §44.
y: spec, §44.
z: spec, §44.
zero_octa: octa,
 MMIX-ARITH §4.
zero_spec: spec, §41.
```

114. The decgamma instruction decreases  $\gamma$  and rS by loading an octabyte from virtual memory location  $(cool\_S - 1) \ll 3$  into the local register ring. The value of  $\beta$  may need to be decreased too (by decreasing rL).

```
\langle Insert an instruction to decrease gamma 114\rangle \equiv
      if (cool\_O.l + cool\_L \equiv cool\_S.l + lring\_size) { /* don't let \gamma pass \beta */
         if (cool \rightarrow i \equiv pop \land cool \rightarrow xx \equiv cool \bot L \land cool \bot L > 1) {
                                /* we'll preserve the main result by moving it down */
             head \rightarrow inst -= #10000:
                                              /* decrease X field of POP in fetch buffer */
             op = OR:
             cool \rightarrow y = specval(\&l[(cool \bigcirc O.l + cool \rightarrow xx - 1) \& lring \_mask]);
             spec\_install(\&l[(cool\_O.l + cool \neg xx - 2) \& lring\_mask], \& cool \neg x);
                           /* decrease rL by 1 */
             spec\_install(\&g[rL],\&cool \neg rl); cool \neg rl.o.l = cool\_L - 1; cool \neg set\_l = true;
         }
      if (cool \rightarrow i \neq or) {
         cool \rightarrow i = decgamma;
         new\_S = incr(cool\_S, -1);
         cool \rightarrow y.p = \Lambda, cool \rightarrow y.o = shift\_left(new\_S, 3);
         spec\_install(\&l[new\_S.l \& lring\_mask], \&cool \rightarrow x);
                             /* this instruction needs to be handled by load/store unit */
         cool \neg ptr \mathbf{a} = (\mathbf{void} *) mem.up;
      cool \neg z = cool \neg b = zero\_spec; cool \neg need\_b = false; cool \neg ren\_x = cool \neg interim = true;
      goto dispatch_done;
```

This code is used in section 120.

115. Storing into memory requires a doubly linked data list of specnodes like the lists we use for local and global registers. In this case the head of the list is called *mem*, and the *addr* fields are physical addresses in memory.

```
\langle \text{ External variables } 4 \rangle + \equiv Extern specnode mem;
```

116. The addr field of a memory specnode is all 1s until the physical address has been computed.

```
\langle Initialize everything 22\rangle +\equiv mem.addr.h = mem.addr.l = -1; mem.up = mem.down = \&mem;
```

117. The CSWAP operation is treated as a partial store, with X as a secondary output. Partial store (pst) commands read an octabyte from memory before they write it.

```
 \langle \text{Special cases of instruction dispatch } 117 \rangle \equiv \\ \textbf{case } cswap: cool \neg ren\_a = true; \\ spec\_install(cool \neg xx \geq cool\_G ? \&g[cool \neg xx] : \&l[(cool\_O.l + cool \neg xx) \& lring\_mask], \\ \& cool \neg a);
```

```
cool \rightarrow i = pst;
case st: if ((op \& \#fe) \equiv STCO) cool \rightarrow b.o.l = cool \rightarrow xx;
case pst: cool \rightarrow mem\_x = true, spec\_install(\&mem, \&cool \rightarrow x); break;
case ld: case ldunc: cool \rightarrow ptr\_a = (void *) mem.up; break;
See also sections 118, 119, 120, 121, 122, 227, 312, 322, 332, 337, 347, and 355.
This code is used in section 101.
```

118. When new data is PUT into special registers 8 or 15–20 (namely rC, rK, rQ, rU, rV, rG, or rL) it can affect many things. Therefore we stop issuing further instructions until such PUTs are committed. Moreover, we will see later that such drastic PUTs defer execution until they reach the hot seat.

```
⟨Special cases of instruction dispatch 117⟩ +≡ case put: if (cool \neg yy \neq 0 \lor cool \neg xx \geq 32) goto illegal\_inst; if (cool \neg xx \geq 8) {
    if (cool \neg xx \leq 11 \land cool \neg xx \neq 8) goto illegal\_inst; if (cool \neg xx \leq 18 \land \neg (cool \neg loc.h \& sign\_bit)) goto privileged\_inst; }
    if (cool \neg xx \equiv 8 \lor (cool \neg xx \geq 15 \land cool \neg xx \leq 20)) freeze_dispatch = true; cool \neg ren\_x = true, spec\_install(\&g[cool \neg xx], \& cool \neg x); break; case get: if (cool \neg yy \lor cool \neg zz \geq 32) goto illegal\_inst; if (cool \neg zz \equiv rO) cool \neg z.o = shift\_left(cool O, 3); else if (cool \neg zz \equiv rS) cool \neg z.o = shift\_left(cool O, 3); else cool \neg z = specval(\&g[cool \neg zz]); break; illegal\_inst: cool \neg interrupt \mid = B\_BIT; goto noop\_inst; case ldvts: if (cool \neg loc.h \& sign\_bit) break; privileged\_inst: cool \neg interrupt \mid = K\_BIT; noop\_inst: cool \neg i = noop; break;
```

```
interrupt: unsigned int, §44.
                                                                       ren_a: bool, §44.
a: specnode, §44.
addr: octa, §40.
                                   K_BIT = 1 \ll 3, \S 54.
                                                                       ren_x: bool, §44.
b: spec, §44.
                                   l: specnode *, §86.
                                                                       rl: specnode, §44.
B_BIT = 1 \ll 2, \S 54.
                                   l: tetra, §17.
                                                                       rL = 20, \S 52.
                                   ld = 56, \S 49.
                                                                       rO = 10, \S 52.
cool: control *, §60.
                                   LDOU = ^{\#}8e, \S47.
                                                                       rS = 11, \S 52.
cool\_G: int, §99.
                                                                       set_l: bool, §44.
cool_L: int, §99.
                                   ldunc = 59, \S 49.
cool_O: octa, §98.
                                   ldvts = 60, \S 49.
                                                                       shift_left: octa (),
cool\_S: octa, §98.
                                   loc: octa, §44.
                                                                         MMIX-ARITH §7.
cswap = 68, §49.
                                   lring\_mask: int, §88.
                                                                       sign\_bit = macro, \S 80.
decgamma = 85, \S 49.
                                   lring_size: int, §86.
                                                                       spec_install: static void (),
dispatch_done: label, §101.
                                   mem_x: bool, §44.
down: specnode *, §40.
                                   need_b: bool, §44.
                                                                       specnode = struct, \S 40.
Extern = macro, \S 4.
                                   new\_S: octa, §99.
                                                                       specval: static spec (), §93.
false = 0, \S 11.
                                   noop = 81, \S 49.
                                                                       st = 63, \S 49.
                                                                       STCO = {}^{\#}b4, \S 47.
freeze_dispatch: register
                                   o: octa, §40.
  bool, §75.
                                   op: register mmix_opcode,
                                                                       true = 1, \S 11.
g: specnode [], §86.
                                      §75.
                                                                       up: specnode *, §40.
                                   OR = \# c0, \S 47.
qet = 54, \S 49.
                                                                       x: specnode, §44.
h: tetra, §17.
                                   or = 34, \S 49.
                                                                       xx: unsigned char, §44.
head: fetch *, §69.
                                   p: specnode *, §40.
                                                                       y: spec, §44.
i: internal_opcode, §44.
                                   pop = 75, \S 49.
                                                                       yy: unsigned char, §44.
incr: octa (), MMIX-ARITH §6.
                                   pst = 66, \S 49.
                                                                       z: spec, §44.
inst: tetra, §68.
                                   ptr_a: void *, §44.
                                                                       zero\_spec: spec, §41.
interim: bool, §44.
                                   put = 55, \S 49.
                                                                       zz: unsigned char, §44.
```

119. A PUSHGO instruction with  $X \ge G$  causes L to increase momentarily by 1, even if L = G. But the value of L will be decreased before the PUSHGO is complete, so it will never actually exceed G. Moreover, we needn't insert an *incrl* command.

```
\langle Special cases of instruction dispatch 117\rangle + \equiv
case pushqo: inst\_ptr.p = \&cool \neg qo;
case pushi:
   { register int x = cool \neg xx;
     if (x > cool\_G) {
        if (((cool\_S.l - cool\_O.l - cool\_L - 1) \& lring\_mask) \equiv 0)
            (Insert an instruction to advance gamma 113)
        x = cool\_L; cool\_L ++;
         cool \neg ren\_x = true, spec\_install(\&l[(cool\_O.l + x) \& lring\_mask], \& cool \neg x);
      cool \neg x.known = true, cool \neg x.o.h = 0, cool \neg x.o.l = x;
      cool \rightarrow ren\_a = true, spec\_install(\&q[rJ], \&cool \rightarrow a);
      cool \neg a.known = true, cool \neg a.o = incr(cool \neg loc, 4);
      cool \rightarrow set\_l = true, spec\_install(\&g[rL], \&cool \rightarrow rl); cool \rightarrow rl.o.l = cool\_L - x - 1;
      new_O = incr(cool_O, x + 1);
   } break:
case syncid: if (cool¬loc.h & siqn_bit) break;
case go: inst\_ptr.p = \&cool \neg go; break;
```

120. We need to know the topmost "hidden" element of the register stack when a POP instruction is dispatched. This element is usually present in the local register ring, unless  $\gamma = \alpha$ .

Once it is known, let x be its least significant byte. We will be decreasing rO by x+1, so we may have to decrease  $\gamma$  repeatedly in order to maintain the condition rS  $\leq$  rO.

```
\langle Special cases of instruction dispatch 117\rangle + \equiv
case pop: if (cool \rightarrow xx \land cool L > cool \rightarrow xx)
      cool \rightarrow y = specval(\&l[(cool \rightarrow O.l + cool \rightarrow xx - 1) \& lring \rightarrow mask]);
pop\_unsave: if (cool\_S.l \equiv cool\_O.l) (Insert an instruction to decrease gamma 114);
   \{ \text{ register tetra } x; \}
      register int new_L;
      register specnode *p = l[(cool\_O.l - 1) \& lring\_mask].up;
      if (p \rightarrow known) x = (p \rightarrow o.l) \& #ff; else goto stall;
      if ((\text{tetra})(cool\_O.l - cool\_S.l) \le x) (Insert an instruction to decrease gamma 114);
      new_O = incr(cool_O, -x - 1);
     if (cool \neg i \equiv pop) new\_L = x + (cool \neg xx \leq cool\_L ? cool \neg xx : cool\_L + 1); else new\_L = x;
      if (new_L > cool_G) new_L = cool_G;
      if (x < new\_L) cool \neg ren\_x = true, spec\_install(\&l[(cool\_O.l - 1) \& lring\_mask], \& cool \neg x);
      cool \neg set\_l = true, spec\_install(\&g[rL], \&cool \neg rl); cool \neg rl.o.l = new\_L;
      if (cool \rightarrow i \equiv pop) {
         cool \neg z.o.l = yz \ll 2;
         if (inst\_ptr.p \equiv UNKNOWN\_SPEC \land new\_head \equiv tail) inst\_ptr.p = \&cool \neg qo;
      }
      break;
   }
```

```
121. \langle \text{Special cases of instruction dispatch } 117 \rangle +\equiv  case mulu: cool \neg ren\_a = true, spec\_install(\&g[rH], \&cool \neg a); break; case <math>div: \text{case } div: cool \neg ren\_a = true, spec\_install(\&g[rR], \&cool \neg a); break;
```

122. It's tempting to say that we could avoid taking up space in the reorder buffer when no operation needs to be done. A JMP instruction qualifies as a no-op in this sense, because the change of control occurs before the execution stage. However, even a no-op might have to be counted in the usage register rU, so it might get into the execution stage for that reason. A no-op can also cause a protection interrupt, if it appears in a negative location. Even more importantly, a program might get into a loop that consists entirely of jumps and no-ops; then we wouldn't be able to interrupt it, because the interruption mechanism needs to find the current location in the reorder buffer! At least one functional unit therefore needs to provide explicit support for JMP, JMPB, and SWYM.

The SWYM instruction with F\_BIT set is a special case: This is a request from the fetch coroutine for an update to the IT-cache, when the page table method isn't implemented in hardware.

```
⟨ Special cases of instruction dispatch 117⟩ +≡
case noop: if (cool¬interrupt & F_BIT) {
      cool¬go.o = cool¬y.o = cool¬loc; inst_ptr = specval(&g[rT]);
    }
    break;

123. ⟨Undo data structures set prematurely in the cool block and break 123⟩ ≡
    if (cool¬ren_x ∨ cool¬mem_x) spec_rem(&cool¬x);
    if (cool¬ren_a) spec_rem(&cool¬a);
    if (cool¬set_l) spec_rem(&cool¬rl);
    if (inst_ptr.p ≡ &cool¬go) inst_ptr.p = UNKNOWN_SPEC;
    break;
```

This code is used in section 75.

```
rT = 13, \S 52.
a: specnode, §44.
                                    loc: octa, §44.
cool: control *, §60.
                                    lring_mask: int, §88.
                                                                         set_l: bool, §44.
cool\_G: int, §99.
                                    mem\_x: bool, §44.
                                                                         sign\_bit = macro, \S 80.
cool_L: int, §99.
                                    mulu = 27, \S 49.
                                                                         spec_install: static void (),
cool_O: octa, §98.
                                    new_head: register fetch *.
                                                                         spec_rem: static void (), §97.
cool\_S: octa, §98.
                                      §74.
div = 9, \S 49.
                                    new_O: \mathbf{octa}, \S 99.
                                                                         specnode = struct, \S 40.
divu = 28, \S 49.
                                    noop = 81, \S 49.
                                                                         specval: static spec (), §93.
{\tt F\_BIT} = 1 \ll 17, \, \S 54.
                                    o: octa, §40.
                                                                         stall: label, §75.
q: specnode [], §86.
                                    p: specnode *, §40.
                                                                         syncid = 65, \, 849.
go = 72, \S 49.
                                    pop = 75, \S 49.
                                                                         tail: fetch *, §69.
                                    pushgo = 74, \S 49.
                                                                         tetra = unsigned int, \S17.
go: specnode, §44.
h: tetra, §17.
                                    pushi = 71, §49.
                                                                         true = 1, \S 11.
                                                                         UNKNOWN_SPEC = macro, \S71.
i: internal_opcode, §44.
                                    ren_a: bool, §44.
                                    ren_x: bool, §44.
incr: octa (), MMIX-ARITH §6.
                                                                         up: \mathbf{specnode} *, \S 40.
incrl = 86, \S 49.
                                    rH = 3, \S 52.
                                                                         x: specnode, §44.
inst\_ptr: \mathbf{spec}, \S 284.
                                    rJ = 4, §52.
                                                                         xx: unsigned char, §44.
interrupt: unsigned int, §44.
                                    rl: specnode, §44.
                                                                         y: spec, §44.
known: bool, §40.
                                    rL = 20, \S 52.
                                                                         yz: register int, §75.
l: specnode *, §86.
                                    rR = 6, \S 52.
                                                                         z: spec, \S44.
l: tetra, §17.
```

**124.** The execution stages. MMIX's raison d'être is its ability to execute instructions. So now we want to simulate the behavior of its functional units.

Each coroutine scheduled for action at the current tick of the clock has a stage number corresponding to a particular subset of the MMIX hardware. For example, the coroutines with stage = 2 are the second stages in the pipelines of the functional units. A coroutine with stage = 0 works in the fetch unit. Several artificially large stage numbers are used to control special coroutines that do things like write data from buffers into memory.

In this program the current coroutine of interest is called self; hence  $self \neg stage$  is the current stage number of interest. Another key variable,  $self \neg ctl$ , is called data; this is the control block being operated on by the current coroutine. We typically are simulating an operation in which  $data \neg x$  is being computed as a function of  $data \neg y$  and  $data \neg z$ . The data record has many fields, as described earlier when we defined **control** structures; for example,  $data \neg owner$  is the same as self, during the execution stage, if it is nonnull.

This part of the simulator is written as if each functional unit is able to handle all 256 operations. In practice, of course, a functional unit tends to be much more specialized; the actual specialization is governed by the dispatcher, which issues an instruction only to a functional unit that supports it. Once an instruction has been dispatched, however, we can simulate it most easily if we imagine that its functional unit is universal.

Coroutines with higher stage numbers are processed first. The three most important variables that govern a coroutine's behavior, once  $self \neg stage$  is given, are the external operation code  $data \neg op$ , the internal operation code  $data \neg i$ , and the value of  $data \neg state$ . We typically have  $data \neg state = 0$  when a coroutine is first fired up.

```
⟨Local variables 12⟩ +≡
register coroutine *self; /* the current coroutine being executed */
register control *data; /* the control block of the current coroutine */
```

125. When a coroutine has done all it wants to on a single cycle, it says **goto** done. It will not be scheduled to do any further work unless the *schedule* routine has been called since it began execution. The *wait* macro is a convenient way to say "Please schedule me to resume again at the current  $data \rightarrow state$ " after a specified time; for example, wait(1) will restart a coroutine on the next clock tick.

```
#define wait(t) { schedule(self, t, data \neg state); goto done; } #define pass\_after(t) schedule(self + 1, t, data \neg state) #define sleep { self \neg next = self; goto done; } /* wait forever */#define awaken(c,t) schedule(c,t,c\neg ctl \neg state)
```

```
\langle Execute all coroutines scheduled for the current time 125\rangle \equiv
  cur\_time ++; if (cur\_time \equiv ring\_size) cur\_time = 0;
  for (self = queuelist(cur\_time); self \neq \&sentinel; self = sentinel.next) {
     sentinel.next = self \neg next; self \neg next = \Lambda; /* unschedule this coroutine */
     data = self \rightarrow ctl;
     if (verbose & coroutine_bit) {
        printf("_\running\"); print_coroutine_id(self); printf("\");
        print_control_block(data); printf("\n");
     switch (self→stage) {
     case 0: (Simulate an action of the fetch coroutine 288);
     case 1: (Simulate the first stage of an execution pipeline 130);
     default: (Simulate later stages of an execution pipeline 135);
     (Cases for control of special coroutines 126);
  terminate: if (self \neg lockloc) *(self \neg lockloc) = \Lambda, self \neg lockloc = \Lambda;
  done:;
This code is used in section 64.
        A special coroutine whose stage number is vanish simply goes away at its
scheduled time.
\langle Cases for control of special coroutines 126\rangle \equiv
case vanish: goto terminate;
See also sections 215, 217, 222, 224, 232, 237, and 257.
This code is used in section 125.
127.
      \langle \text{Global variables } 20 \rangle + \equiv
  coroutine mem_locker; /* trivial coroutine that vanishes */
  coroutine Dlocker;
                             /* another */
  control vanish_ctl;
                             /* such coroutines share a common control block */
```

```
control = struct, \S 44.
                                    owner: coroutine *, \S44.
                                                                        schedule: static void (), §28.
\mathbf{coroutine} = \mathbf{struct}, \ \S 23.
                                    print_control_block: static
                                                                        sentinel: coroutine, §36.
coroutine\_bit = 1 \ll 2, \S 8.
                                      void (), §46.
                                                                        stage: int, \S 23.
                                                                        state: int, §44.
ctl: control *, §23.
                                   print_coroutine_id: static
cur\_time: int, §29.
                                     void (), §25.
                                                                        vanish = 98, \S 129.
i: internal_opcode, §44.
                                   printf: int (), <stdio.h>.
                                                                       verbose: int, §4.
lockloc: coroutine **, §23.
                                    queuelist: static coroutine
                                                                       x: specnode, §44.
next: coroutine *, §23.
                                     *(), §35.
                                                                       y: spec, §44.
op: mmix_opcode, §44.
                                   ring\_size: int, §29.
                                                                       z: spec, §44.
```

```
128.
        \langle Initialize everything 22\rangle + \equiv
  mem_locker.name = "Locker";
  mem\_locker.ctl = \&vanish\_ctl;
  mem\_locker.stage = vanish:
  Dlocker.name = "Dlocker";
  Dlocker.ctl = &vanish\_ctl;
  Dlocker.stage = vanish;
  vanish\_ctl.go.o.l = 4;
  for (j = 0; j < DTcache \neg ports; j++) DTcache \neg reader[j].ctl = & vanish\_ctl;
  if (Dcache)
     for (j = 0; j < Dcache \neg ports; j++) Deache \neg reader[j].ctl = & vanish\_ctl;
  for (j = 0; j < ITcache \rightarrow ports; j++) ITcache \rightarrow reader[j].ctl = & vanish\_ctl;
  if (Icache)
     for (j = 0; j < Icache \neg ports; j++) Icache \neg reader[j].ctl = &vanish\_ctl;
        Here is a list of the stage numbers for special coroutines to be defined below.
\langle Header definitions 6 \rangle + \equiv
#define max_stage 99
                              /* exceeds all stage numbers */
#define vanish 98
                          /* special coroutine that just goes away */
#define flush_to_mem
                                  /* coroutine for flushing from a cache to memory */
#define flush\_to\_S 96
                              /* coroutine for flushing from a cache to the S-cache */
                                  /* coroutine for filling a cache from memory */
#define fill_from_mem
#define fill_from_S 94
                               /* coroutine for filling a cache from the S-cache */
                                 /* coroutine for filling a translation cache */
#define fill_from_virt 93
                                     /* coroutine for emptying the write buffer */
#define write_from_wbuf
```

130. At the very beginning of stage 1, a functional unit will stall if necessary until its operands are available. As soon as the operands are all present, the *state* is set nonzero and execution proper begins.

/\* coroutine for cleaning the caches \*/

```
⟨ Simulate the first stage of an execution pipeline 130⟩ ≡
switch1: switch (data¬state) {
  case 0: ⟨Wait for input data if necessary; set state = 1 if it's there 131⟩;
  case 1: ⟨Begin execution of an operation 132⟩;
  case 2: ⟨Pass data to the next stage of the pipeline 134⟩;
  case 3: ⟨Finish execution of an operation 144⟩;
  ⟨Special cases for states in the first stage 266⟩;
}
```

This code is used in section 125.

#define cleanup 91

131. If some of our input data has been computed by another coroutine on the current cycle, we grab it now but wait for the next cycle. (An actual machine wouldn't have latched the data until then.)

```
\langle Wait for input data if necessary; set state = 1 if it's there 131 \rangle \equiv
   i = 0:
   if (data \rightarrow y.p) {
       i++;
       if (data \neg y.p \neg known) data \neg y.o = data \neg y.p \neg o, data \neg y.p = \Lambda;
       else i += 10:
   if (data \rightarrow z.p) {
       i++;
       if (data \neg z.p \neg known) data \neg z.o = data \neg z.p \neg o, data \neg z.p = \Lambda;
       else i += 10:
   if (data \rightarrow b.p) {
       if (data \neg need\_b) j ++;
       if (data \rightarrow b.p \rightarrow known) data \rightarrow b.o = data \rightarrow b.p \rightarrow o, data \rightarrow b.p = \Lambda;
       else if (data \neg need\_b) i += 10:
   if (data \neg ra.p) {
       if (data \rightarrow need\_ra) j \leftrightarrow ;
       if (data \neg ra.p \neg known) data \neg ra.o = data \neg ra.p \neg o, data \neg ra.p = \Lambda;
       else if (data \neg need\_ra) j += 10;
   if (i < 10) data\rightarrowstate = 1;
   if (i) wait (1):
                                /* otherwise we fall through to case 1 */
This code is used in section 130.
```

```
b: spec, §44.
                                  j: register int, §10.
                                                                     ports: int, §167.
ctl: control *, §23.
                                  j: register int, §12.
                                                                     ra: spec, §44.
                                  known: bool, §40.
data: register control *,
                                                                     reader: coroutine *, §167.
                                  l: tetra, §17.
                                                                    stage: int, \S 23.
  §124.
Dcache: cache *, §168.
                                  mem_locker: coroutine, §127.
                                                                    state: int, §44.
                                  name: \mathbf{char} *, \S 23.
Dlocker: coroutine, §127.
                                                                     vanish_ctl: control, §127.
DTcache: cache *, §168.
                                  need_b: bool, §44.
                                                                     wait = macro(), \S 125.
go: specnode, §44.
                                  need_ra: bool, §44.
                                                                    y: spec, §44.
Icache: cache *, §168.
                                  o: octa, §40.
                                                                    z: spec, \S44.
ITcache: cache *, §168.
                                  p: specnode *, §40.
```

132. Simple register-to-register instructions like ADD are assumed to take just one cycle, but others like FADD almost certainly require more time. This simulator can be configured so that FADD might take, say, four pipeline stages of one cycle each (1+1+1+1), or two pipeline stages of two cycles each (2+2), or a single unpipelined stage lasting four cycles (4), etc. In any case the simulator computes the results now, for simplicity, placing them in  $data \neg x$  and possibly also in  $data \neg a$  and/or  $data \neg interrupt$ . The results will not be officially made known until the proper time.

```
⟨ Begin execution of an operation 132⟩ ≡
switch (data¬i) {
   ⟨ Cases to compute the results of register-to-register operation 137⟩;
   ⟨ Cases to compute the virtual address of a memory operation 265⟩;
   ⟨ Cases for stage 1 execution 155⟩;
}
⟨ Set things up so that the results become known when they should 133⟩;
This code is used in section 130.
```

**133.** If the internal opcode  $data \rightarrow i$  is  $max\_pipe\_op$  or less, a special pipeline sequence like 1+1+1+1 or 2+2 or 15+10, etc., has been configured. Otherwise we assume that the pipeline sequence is simply 1.

Suppose the pipeline sequence is  $t_1 + t_2 + \cdots + t_k$ . Each  $t_j$  is positive and less than 256, so we represent the sequence as a string  $pipe\_seq[data \neg i]$  of unsigned "characters," terminated by 0. Given such a string, we want to do the following: Wait  $(t_1 - 1)$  cycles and pass data to stage 2; wait  $t_2$  cycles and pass data to stage 3; ...; wait  $t_{k-1}$  cycles and pass data to stage k; wait  $t_k$  cycles and make the results known.

 $\langle$  Set things up so that the results become known when they should  $133 \rangle \equiv data \neg state = 3;$  if  $(data \neg i \leq max\_pipe\_op)$  { register unsigned char  $*s = pipe\_seq[data \neg i];$   $i = s[0] + data \neg denin;$ 

The value of denin is added to  $t_1$ ; the value of denout is added to  $t_k$ .

```
j = s[0] + data \rightarrow denin;
if (s[1]) data \rightarrow state = 2; /* more than one stage */
else j += data \rightarrow denout;
if (j > 1) wait(j - 1);
}
goto switch1;
```

This code is used in section 132.

**134.** When we're in stage j, the coroutine for stage j + 1 of the same functional unit is self + 1.

```
 \langle \text{ Pass } \textit{data} \text{ to the next stage of the pipeline } 134 \rangle \equiv \\ \textit{pass\_data} \colon \textbf{if } ((\textit{self} + 1) \neg \textit{next}) \; \textit{wait}(1); \; /* \; \text{stall if the next stage is occupied } */ \\ \{ \text{ } \textbf{register unsigned char } *s = \textit{pipe\_seq} [\textit{data} \neg \textit{i}]; \\ \textit{j} = s[\textit{self} \neg \textit{stage}]; \\ \textbf{if } (s[\textit{self} \neg \textit{stage} + 1] \equiv 0) \; \textit{j} += \textit{data} \neg \textit{denout}, \textit{data} \neg \textit{state} = 3; \\ /* \; \text{the next stage is the last } */ \\ \textit{pass\_after}(\textit{j}); \\ \} \\ \textit{passit: } (\textit{self} + 1) \neg \textit{ctl} = \textit{data};
```

```
data¬owner = self + 1;
goto done;
This code is used in section 130.

135. ⟨Simulate later stages of an execution pipeline 135⟩ ≡
switch2: if (data¬b.p ∧ data¬b.p¬known) data¬b.o = data¬b.p¬o, data¬b.p = Λ;
switch (data¬state) {
  case 0: panic(confusion("switch2"));
  case 1: ⟨Begin execution of a stage-two operation 351⟩;
  case 2: goto pass_data;
  case 3: goto fin_ex;
  ⟨Special cases for states in later stages 272⟩;
}
This code is used in section 125.
```

136. The default pipeline times use only one stage; they can be overridden by *MMIX\_config*. The total number of stages supported by this simulator is limited to 90, since it must never interfere with the *stage* numbers for special coroutines defined below. (The author doesn't feel guilty about making this restriction.)

```
⟨External variables 4⟩ +≡
#define pipe_limit 90
Extern unsigned char pipe_seg[max_pipe_op + 1][pipe_limit + 1]:
```

137. The simplest of all register-to-register operations is *set*, which occurs for commands like SETH as well as for commands like GETA. (We might as well start with the easy cases and work our way up.)

```
\langle Cases to compute the results of register-to-register operation 137 \rangle \equiv case set: data \neg x.o = data \neg z.o; break; See also sections 138, 139, 140, 141, 142, 143, 343, 344, 345, 346, 348, and 350. This code is used in section 132.
```

a: specnode, §44.
b: spec, §44.
confusion = macro (), §13.
ctl: control \*, §23.
data: register control \*,
§124.
denin: int, §44.
denout: int, §44.
done: label, §125.
Extern = macro, §4.
fin\_ex: label, §144.

```
i: internal_opcode, §44.
interrupt: unsigned int, §44.
j: register int, §12.
known: bool, §40.
max_pipe_op = feps, §49.
MMIX_config: void (),
MMIX-CONFIG §38.
next: coroutine *, §23.
o: octa, §40.
owner: coroutine *, §44.
p: specnode *, §40.
```

```
\begin{array}{l} panic = \text{macro} \; (\;), \; \S 13. \\ pass\_after = \text{macro} \; (\;), \; \S 125. \\ self: \; \textbf{register coroutine} \; *, \\ \S 124. \\ set = 33, \; \S 49. \\ stage: \; \textbf{int}, \; \S 23. \\ state: \; \textbf{int}, \; \S 44. \\ switch1: \; label, \; \S 130. \\ wait = \text{macro} \; (\;), \; \S 125. \\ x: \; \textbf{speenode}, \; \S 44. \\ z: \; \textbf{spec}, \; \S 44. \\ \end{array}
```

**138.** Here are the basic boolean operations, which account for 24 of MMIX's 256 opcodes.

```
\langle Cases to compute the results of register-to-register operation 137\rangle + \equiv
case or: data \rightarrow x.o.h = data \rightarrow y.o.h \mid data \rightarrow z.o.h;
    data \rightarrow x.o.l = data \rightarrow y.o.l \mid data \rightarrow z.o.l;
    break:
case orn: data \rightarrow x.o.h = data \rightarrow y.o.h \mid \sim data \rightarrow z.o.h;
    data \rightarrow x.o.l = data \rightarrow y.o.l \mid \sim data \rightarrow z.o.l;
    break:
case nor: data \rightarrow x.o.h = \sim (data \rightarrow y.o.h \mid data \rightarrow z.o.h);
     data \rightarrow x.o.l = \sim (data \rightarrow y.o.l \mid data \rightarrow z.o.l);
    break:
case and: data \rightarrow x.o.h = data \rightarrow y.o.h \& data \rightarrow z.o.h;
    data \rightarrow x.o.l = data \rightarrow y.o.l \& data \rightarrow z.o.l;
    break:
case andn: data \rightarrow x.o.h = data \rightarrow y.o.h \& \sim data \rightarrow z.o.h;
    data \rightarrow x.o.l = data \rightarrow y.o.l \& \sim data \rightarrow z.o.l;
case nand: data \rightarrow x.o.h = \sim (data \rightarrow y.o.h \& data \rightarrow z.o.h);
    data \rightarrow x.o.l = \sim (data \rightarrow y.o.l \& data \rightarrow z.o.l);
case xor: data \neg x.o.h = data \neg y.o.h \oplus data \neg z.o.h;
    data \rightarrow x.o.l = data \rightarrow y.o.l \oplus data \rightarrow z.o.l;
case nxor: data \rightarrow x.o.h = data \rightarrow y.o.h \oplus \sim data \rightarrow z.o.h;
     data \rightarrow x.o.l = data \rightarrow y.o.l \oplus \sim data \rightarrow z.o.l;
    break:
```

139. The implementation of ADDU is only slightly more difficult. It would be trivial except for the fact that internal opcode addu is used not only for the ADDU[I] and INC[M] [H,L] operations, in which we simply want to add  $data \rightarrow y.o$  to  $data \rightarrow z.o$ , but also for operations like 4ADDU.

```
 \begin{array}{l} \langle \, {\rm Cases} \ {\rm to} \ {\rm compute} \ {\rm the} \ {\rm results} \ {\rm of} \ {\rm register} \ {\rm to} {\rm -register} \ {\rm operation} \ 137 \, \rangle \ + \equiv \\ {\rm \bf case} \ addu \colon \ data \neg x.o = oplus((data \neg op \ \& \ ^{\#}{\rm f8}) \equiv \ ^{\#}{\rm 28} \ ? \\ shift\_left(data \neg y.o, 1 + ((data \neg op \gg 1) \ \& \ ^{\#}{\rm 3})) : data \neg y.o, data \neg z.o); \\ {\rm \bf break}; \\ {\rm \bf case} \ subu \colon \ data \neg x.o = ominus(data \neg y.o, data \neg z.o); \ {\rm \bf break}; \\ \end{array}
```

140. Signed addition and subtraction produce the same results as their unsigned counterparts, but overflow must also be detected. Overflow occurs when adding y to z if and only if y and z have the same sign but their sum has a different sign. Overflow occurs in the calculation x = y - z if and only if it occurs in the calculation y = x + z.

```
\langle Cases to compute the results of register-to-register operation 137\rangle +\equiv case add: data \neg x.o = oplus(data \neg y.o, data \neg z.o); if (((data \neg y.o.h \oplus data \neg z.o.h) \& sign\_bit) \equiv 0 \land ((data \neg y.o.h \oplus data \neg x.o.h) \& sign\_bit) \neq 0) data \neg interrupt \models V\_BIT;
```

break;

```
case sub: data \neg x.o = ominus(data \neg y.o, data \neg z.o);

if (((data \neg x.o.h \oplus data \neg z.o.h) \& sign\_bit) \equiv 0 \land ((data \neg y.o.h \oplus data \neg x.o.h) \& sign\_bit) \neq 0)

data \neg interrupt \mid = V\_BIT;

break:
```

141. The shift commands might take more than one cycle, or they might even be pipelined, if the default value of  $pipe\_seq[sh]$  is changed. But we compute shifts all at once here, because other parts of the simulator will take care of the pipeline timing. (Notice that shlu is changed to sh, for this reason. Similar changes to the internal op codes are made for other operators below.)

```
#define shift\_amt (data \neg z.o.h \lor data \neg z.o.l \ge 64?64: data \neg z.o.l)

(Cases to compute the results of register-to-register operation 137 \rangle + \equiv

case shlu: data \neg x.o = shift\_left(data \neg y.o., shift\_amt); data \neg i = sh; break;

case shl: data \neg x.o = shift\_left(data \neg y.o., shift\_amt); data \neg i = sh;

{ octa tmpo;

tmpo = shift\_right(data \neg x.o., shift\_amt, 0);

if (tmpo.h \neq data \neg y.o.h \lor tmpo.l \neq data \neg y.o.l) data \neg interrupt \mid = V\_BIT;
} break;

case shru: data \neg x.o = shift\_right(data \neg y.o., shift\_amt, 1); data \neg i = sh; break;

case shr: data \neg x.o = shift\_right(data \neg y.o., shift\_amt, 0); data \neg i = sh; break;
```

**142.** The MUX operation has three operands, namely  $data \neg y$ ,  $data \neg z$ , and  $data \neg b$ ; the third operand is the current (speculative) value of rM, the special mask register. Otherwise MUX is unexceptional.

```
\langle Cases to compute the results of register-to-register operation 137\rangle +\equiv case mux: data \neg x.o.h = (data \neg y.o.h \& data \neg b.o.h) + (data \neg z.o.h \& \sim data \neg b.o.h); data \neg x.o.l = (data \neg y.o.l \& data \neg b.o.l) + (data \neg z.o.l \& \sim data \neg b.o.l); break;
```

```
add = 29, \S 49.
                                    nxor = 41, \S 49.
                                                                       shift_right: octa (),
addu = 30, \S 49.
                                   o: octa, §40.
                                                                         MMIX-ARITH §7.
and = 37, §49.
                                   octa = struct, \S17.
                                                                       shl = 44, \, 849.
                                                                       shlu = 42, §49.
andn = 38, §49.
                                    ominus: octa (),
b: spec, §44.
                                     MMIX-ARITH §5.
                                                                       shr = 45, \S 49.
data: register control *,
                                    op: mmix_opcode, §44.
                                                                       shru = 43, \S 49.
  §124.
                                    oplus: octa (), MMIX-ARITH §5. sign\_bit = macro, §80.
h: tetra, §17.
                                    or = 34, \S 49.
                                                                       sub = 31, \S 49.
i: internal_opcode, §44.
                                    orn = 35, \S 49.
                                                                       subu = 32, \S 49.
interrupt: unsigned int, §44.
                                    pipe_seq: unsigned char [][],
                                                                       V_BIT = 1 \ll 14, \S 54.
l: tetra, §17.
                                      §136.
                                                                       x: specnode, \S 44.
mux = 11, \S 49.
                                    sh = 10, \S 49.
                                                                       xor = 40, \S 49.
nand = 39, \S 49.
                                    shift_left: octa (),
                                                                       y: spec, §44.
nor = 36, \S 49.
                                      MMIX-ARITH §7.
                                                                       z: spec, §44.
```

**143.** Comparisons are a breeze.

```
⟨ Cases to compute the results of register-to-register operation 137⟩ +≡
case cmp: if ((data¬y.o.h & sign_bit) > (data¬z.o.h & sign_bit)) goto cmp_neg;
if ((data¬y.o.h & sign_bit) < (data¬z.o.h & sign_bit)) goto cmp_pos;
case cmpu: if (data¬y.o.h < data¬z.o.h) goto cmp_neg;
if (data¬y.o.h > data¬z.o.h) goto cmp_pos;
if (data¬y.o.l < data¬z.o.l) goto cmp_neg;
if (data¬y.o.l > data¬z.o.l) goto cmp_pos;
cmp_zero: break; /* data¬x is zero */
cmp_pos: data¬x.o.l = 1; break; /* data¬x.o.h is zero */
cmp_neg: data¬x.o = neg_one; break;
```

144. The other operations will be deferred until later, now that we understand the basic ideas. But one more piece of code ought to be written before we move on, because it completes the execution stage for the simple cases already considered.

The  $ren_x$  and  $ren_a$  fields tell us whether the x and/or a fields contain valid information that should become officially known.

```
⟨ Finish execution of an operation 144⟩ ≡
fin_ex: if (data¬ren_x) data¬x.known = true;
else if (data¬mem_x) {
    data¬x.known = true;
    if (¬(data¬x.addr.h & #ffff0000)) data¬x.addr.l &= −8;
}
if (data¬ren_a) data¬a.known = true;
if (data¬loc.h & sign_bit) data¬ra.o.l = 0;
    /* no trips enabled for the operating system */
if (data¬interrupt & #ffff) ⟨ Handle interrupt at end of execution stage 307⟩;
die: data¬owner = Λ; goto terminate; /* this coroutine now fades away */
This code is used in section 130.
```

145. The commission/deissue stage. Control blocks leave the reorder buffer either at the hot end (when they're committed) or at the cool end (when they're deissued). We hope most of them are committed, but from time to time our speculation is incorrect and we must deissue a sequence of instructions that prove to be unwanted. Deissuing must take priority over committing, because the dispatcher cannot do anything until the machine's cool state has stabilized.

Deissuing changes the cool state by undoing the most recently issued instructions, in reverse order. Committing changes the hot state by doing the least recently issued instructions, in their original order. Both operations are similar, so we assume that they take the same time; at most *commit\_max* instructions are deissued and/or committed on each clock cycle.

```
⟨ Deissue the coolest instruction 145⟩ ≡
{
    cool = (cool ≡ reorder_top ? reorder_bot : cool + 1);
    if (verbose & issue_bit) {
        printf("Deissuing□"); print_control_block(cool);
        if (cool¬owner) { printf("□"); print_coroutine_id(cool¬owner); }
        printf("\n");
    }
    if (cool¬ren_x) rename_regs ++, spec_rem(&cool¬x);
    if (cool¬ren_a) rename_regs ++, spec_rem(&cool¬x);
    if (cool¬ren_a) remame_slots ++, spec_rem(&cool¬x);
    if (cool¬set_l) spec_rem(&cool¬rl);
    if (cool¬owner) {
        if (cool¬owner¬lockloc) *(cool¬owner¬lockloc) = Λ, cool¬owner¬lockloc = Λ;
        if (cool¬owner¬next) unschedule(cool¬owner);
    }
    cool_O = cool¬cur_O; cool_S = cool¬cur_S;
    deissues --;
}
```

This code is used in section 67.

```
a: specnode, §44.
                                   l: tetra, §17.
                                                                      ren_x: bool, §44.
                                                                      rename_regs: int, §86.
addr: \mathbf{octa}, \S 40.
                                   loc: octa, §44.
                                                                      reorder_bot: control *, §60.
cmp = 46, \S 49.
                                   lockloc: coroutine **, §23.
cmpu = 47, \S 49.
                                   mem\_slots: int, §86.
                                                                      reorder_top: control *, §60.
commit\_max: int, \S 59.
                                   mem_x: bool, §44.
                                                                      rl: specnode, §44.
cool: control *, \S60.
                                   neq_one: octa, MMIX-ARITH §4.
                                                                      set_l: bool, §44.
cool_O: octa, §98.
                                                                      sign\_bit = macro, \S 80.
                                   next: coroutine *, \S 23.
cool\_S: octa, §98.
                                                                      spec_rem: static void (), §97.
                                  o: octa, §40.
                                   owner: coroutine *, §44.
cur_O: octa, §44.
                                                                      terminate: label, §125.
cur_S: octa, §44.
                                   print_control_block: static
                                                                      true = 1, \S 11.
data: register control *,
                                                                      unschedule: static void (),
                                     void (), §46.
  §124.
                                   print_coroutine_id: static
                                                                        §33.
deissues: int, §60.
                                                                      verbose: int, §4.
                                     void (), §25.
h: tetra, §17.
                                   printf: int (), <stdio.h>.
                                                                      x: specnode, §44.
interrupt: unsigned int, §44.
                                   ra: spec, §44.
                                                                      y: spec, §44.
issue\_bit = 1 \ll 0, \S 8.
                                   ren_a: bool, §44.
                                                                      z: spec, \S44.
known: bool, \S 40.
```

```
146.
         \langle Commit the hottest instruction, or break if it's not ready 146\rangle \equiv
     if (nullifying) (Nullify the hottest instruction 147)
     else {
        if (hot \rightarrow i \equiv qet \land hot \rightarrow zz \equiv rQ) new Q = oandn(q[rQ].o, hot \rightarrow x.o);
         else if (hot \neg i \equiv put \land hot \neg xx \equiv rQ) \ hot \neg x.o.h | = new\_Q.h, hot \neg x.o.l | = new\_Q.l;
         if (hot→mem_x) (Commit to memory if possible, otherwise break 256);
         if (hot \rightarrow stack\_alert) stack\_overflow = true;
         else if (stack\_overflow \land \neg hot \neg interim) {
            q[rQ].o.l = STACK_OVERFLOW, new_Q.l = STACK_OVERFLOW, stack_overflow = false;
            if (verbose & issue_bit) {
               printf("\_setting\_rQ="); print\_octa(g[rQ].o); printf("\n");
         if (verbose & issue_bit) {
            printf("Committing_"); print_control_block(hot); printf("\n");
         if (hot \neg ren\_x) rename\_regs ++, hot \neg x.up \neg o = hot \neg x.o, spec\_rem(\&(hot \neg x));
         if (hot \neg ren\_a) rename\_regs ++, hot \neg a.up \neg o = hot \neg a.o, spec\_rem(\&(hot \neg a));
         if (hot \rightarrow set\_l) hot \rightarrow rl.up \rightarrow o = hot \rightarrow rl.o, spec\_rem(\&(hot \rightarrow rl));
         if (hot \neg arith\_exc) q[rA].o.l |= hot \neg arith\_exc;
         if (hot \neg usage) {
            g[rU].o.l++; if (g[rU].o.l \equiv 0) {
               g[rU].o.h++; if ((g[rU].o.h \& #7fff) \equiv 0) g[rU].o.h -= #8000;
            }
         }
     if (hot-interrupt > H_BIT) ⟨Begin an interruption and break 317⟩;
```

This code is used in section 67.

147. A load or store instruction is "nullified" if it is about to be captured by a trap interrupt. In such cases it will be the only item in the reorder buffer; thus nullifying is sort of a cross between deissuing and committing. (It is important to have stopped dispatching when nullification is necessary, because instructions such as *incgamma* and *decgamma* change rS, and we need to change it back when an unexpected interruption occurs.)

```
 \langle \text{Nullify the hottest instruction } 147 \rangle \equiv \{ \\ \text{if } (\textit{verbose \& issue\_bit}) \ \{ \\ \textit{printf} ("\text{Nullifying}\_"); \ \textit{print\_control\_block}(\textit{hot}); \ \textit{printf} ("\n"); \\ \} \\ \text{if } (\textit{hot}\_\textit{ren}\_x) \ \textit{rename\_regs} ++, \textit{spec\_rem}(\&\textit{hot}\_x); \\ \text{if } (\textit{hot}\_\textit{ren}\_a) \ \textit{rename\_regs} ++, \textit{spec\_rem}(\&\textit{hot}\_a); \\ \text{if } (\textit{hot}\_\textit{mem}\_x) \ \textit{mem\_slots} ++, \textit{spec\_rem}(\&\textit{hot}\_x); \\ \text{if } (\textit{hot}\_\textit{set\_l}) \ \textit{spec\_rem}(\&\textit{hot}\_\textit{rl}); \\ \textit{cool\_O} = \textit{hot}\_\textit{cur\_O}, \textit{cool\_S} = \textit{hot}\_\textit{cur\_S}; \\ \textit{nullifying} = \textit{false}; \\ \end{cases}
```

}

This code is used in section 146.

148. Interrupt bits in rQ might be lost if they are set between a GET and a PUT. Therefore we don't allow PUT to zero out bits that have become 1 since the most recently committed GET.

```
\langle Global variables 20\rangle +\equiv octa new\_Q; /* when rQ increases in any bit position, so should this */bool stack\_overflow; /* stack overflow not yet reported */
```

a: specnode, §44. arith\_exc: unsigned int, §44.  $bool = enum, \S 11.$ cool\_O: octa, §98.  $cool\_S$ : octa, §98. *cur\_O*: **octa**, §44. cur\_S: octa, §44.  $decgamma = 85, \S 49.$  $false = 0, \S 11.$ g: **specnode** [], §86.  $get = 54, \S 49.$ h: tetra, §17.  $H_BIT = 1 \ll 16, \S 54.$ *hot*: **control** \*, §60. i: internal\_opcode, §44.  $incgamma = 84, \S 49.$ interim: bool, §44.

interrupt: unsigned int, §44.  $issue\_bit = 1 \ll 0, \S 8.$ l: tetra, §17.  $mem\_slots$ : int, §86.  $mem_x$ : bool, §44. nullifying: bool, §315. o: **octa**, §40. oandn: octa (), MMIX-ARITH  $\S 25$ .  $octa = struct, \S 17.$ print\_control\_block: static void (), §46. print\_octa: static void (), §19. printf: int (), <stdio.h>.  $put = 55, \S 49.$  $rA = 21, \S 52.$ *ren\_a*: **bool**, §44.

*ren\_x*: **bool**, §44.  $rename\_regs$ : int, §86. rl: specnode, §44.  $rQ = 16, \S 52.$  $rU = 17, \S 52.$ set\_l: **bool**, §44. spec\_rem: static void (), §97. stack\_alert: bool, §44.  $STACK_OVERFLOW = 1 \ll 7, \S 57.$  $true = 1, \S 11.$ up: specnode \*, §40. usage: bool, §44. verbose: int, §4. x: specnode, §44. xx: unsigned char, §44. zz: unsigned char, §44.

149. An instruction will not be committed immediately if it violates the basic security rule of MMIX: An instruction in a nonnegative location should not be performed unless all eight of the internal interrupts have been enabled in the interrupt mask register rK. Conversely, an instruction in a negative location should not be performed if the P\_BIT is enabled in rK.

Such instructions take one extra cycle before they are committed. The nonnegative-location case turns on the S\_BIT of both rK and rQ, leading to an immediate interrupt (unless the current instruction is *trap*, *put*, or *resume*).

```
\langle Check for security violation, break if so 149\rangle \equiv
  {
     if (hot \neg loc.h \& sign\_bit) {
        if ((g[rK].o.h \& P_BIT) \land \neg(hot \neg interrupt \& P_BIT)) {
           hot \rightarrow interrupt \mid = P_BIT;
           g[rQ].o.h \models P_BIT;
           new_Q.h \models P_BIT;
           if (verbose & issue_bit) {
              printf("\_setting\_rQ="); print\_octa(g[rQ].o); printf("\n");
           break:
        }
     } else if ((g[rK].o.h \& #ff) \neq #ff \land \neg(hot \neg interrupt \& S_BIT)) {
        hot \neg interrupt \mid = S_BIT;
        q[rQ].o.h = S_BIT;
        new\_Q.h \mid = S_BIT;
        q[rK].o.h = S_BIT;
        if (verbose & issue_bit) {
           printf("\_setting\_rQ="); print\_octa(g[rQ].o);
           printf(", \underline{r}K="); print\_octa(g[rK].o); printf("\n");
        break;
```

This code is used in section 67.

150. Branch prediction. An MMIX programmer distinguishes statically between "branches" and "probable branches," but many modern computers attempt to do better by implementing dynamic branch prediction. (See, for example, section 4.3 of Hennessy and Patterson's *Computer Architecture*, second edition.) Experience has shown that dynamic branch prediction can significantly improve the performance of speculative execution, by reducing the number of instructions that need to be deissued.

This simulator has an optional  $bp\_table$  containing  $2^{a+b+c}$  entries of n bits each, where n is between 1 and 8. Usually n is 1 or 2 in practice, but 8 bits are allocated per entry for convenience in this program. The  $bp\_table$  is consulted and updated on every branch instruction (every B or PB instruction, but not JMP), for advice on past history of similar situations. It is indexed by the a least significant bits of the address of the instruction, the b most recent bits of global branch history, and the next c bits of both address and history (exclusive-ored).

A  $bp\_table$  entry begins at zero and is regarded as a signed n-bit number. If it is nonnegative, we will follow the prediction in the instruction, namely to predict a branch taken only in the PB case. If it is negative, we will predict the opposite of the instruction's recommendation. The n-bit number is increased (if possible) if the instruction's prediction was correct, decreased (if possible) if the instruction's prediction was incorrect.

(Incidentally, a large value of n is not necessarily a good idea. For example, if n=8 the machine might need 128 steps to recognize that a branch taken the first 150 times is not taken the next 150 times. And if we modify the update criteria to avoid this problem, we obtain a scheme that is rarely better than a simple scheme with smaller n.)

The values a, b, c, and n in this discussion are called  $bp\_a, bp\_b, bp\_c$ , and  $bp\_n$  in the program.

```
\langle \text{External variables 4} \rangle + \equiv
Extern int bp\_a, bp\_b, bp\_c, bp\_n; /* parameters for branch prediction */
Extern char *bp\_table; /* either \Lambda or an array of 2^{a+b+c} items */
```

```
Extern = macro, §4.

g: specnode [], §86.

h: tetra, §17.

hot: control *, §60.

interrupt: unsigned int, §44.

issue\_bit = 1 \ll 0, §8.

loc: octa, §44.
```

```
\begin{array}{l} \textit{new\_Q}\colon \mathbf{octa}, \, \S 148. \\ \textit{o:} \ \mathbf{octa}, \, \S 40. \\ \texttt{P\_BIT} = 1 \ll 0, \, \S 54. \\ \textit{print\_octa}\colon \mathbf{static\ void\ ()}, \, \S 19. \\ \textit{printf}\colon \mathbf{int\ ()}, \, \langle \mathbf{stdio.h} \rangle. \\ \textit{put} = 55, \, \S 49. \\ \textit{resume} = 76, \, \S 49. \\ \end{array}
```

```
\begin{split} rK &= 15, \, \S 52. \\ rQ &= 16, \, \S 52. \\ \mathbf{S\_BIT} &= 1 \ll 1, \, \S 54. \\ sign\_bit &= \mathsf{macro}, \, \S 80. \\ trap &= 82, \, \S 49. \\ verbose \colon \mathbf{int}, \, \S 4. \end{split}
```

151. Branch prediction is made when we are either about to issue an instruction or peeking ahead. We look at the  $bp\_table$ , but we don't want to update it yet.

```
 \begin{tabular}{ll} $\langle$ & Predict a branch outcome $151$ \rangle \equiv $\{$ & $predicted = op \& $\#$10; & $/*$ start with the instruction's recommendation $*/$ & if $(bp\_table)$ $\{$ & $register int $h$; & $m = ((head \neg loc.l \& bp\_cmask) \ll bp\_b) + (head \neg loc.l \& bp\_amask); & $m = ((cool\_hist \& bp\_bcmask) \ll bp\_a) \oplus (m \gg 2); & $h = bp\_table[m]; & $if $(h \& bp\_npower)$ & $predicted \oplus = $\#$10; & $\}$ & $if $(predicted)$ & $peek\_hist = (peek\_hist \ll 1) + 1; & $else $peek\_hist \ll = 1; & $\}$ \\ \end{tabular}
```

This code is used in section 85.

152. We update the  $bp\_table$  when an instruction is issued. And we store the opposite table value in  $cool \neg x.o.l$ , just in case our prediction turns out to be wrong.

```
\langle Record the result of branch prediction 152\rangle \equiv
  if (bp_table) { register int reversed, h, h_up, h_down;
     reversed = op \& #10;
     if (peek\_hist \& 1) reversed \oplus = \#10;
     m = ((head \neg loc.l \& bp\_cmask) \ll bp\_b) + (head \neg loc.l \& bp\_amask);
     m = ((cool\_hist \& bp\_bcmask) \ll bp\_a) \oplus (m \gg 2);
     h = bp\_table[m];
     h\_up = (h+1) \& bp\_nmask; if (h\_up \equiv bp\_npower) h\_up = h;
     if (h \equiv bp\_npower) h\_down = h; else h\_down = (h-1) \& bp\_nmask;
     if (reversed) {
        bp\_table[m] = h\_down, cool \neg x.o.l = h\_up;
        cool \neg i = pbr + br - cool \neg i; /* reverse the sense */
        bp\_rev\_stat ++;
     } else {
        bp\_table[m] = h\_up, cool \neg x.o.l = h\_down; /* go with the flow */
        bp\_ok\_stat++;
     if (verbose & show_pred_bit) {
        printf("□predicting□"); print_octa(cool→loc);
        printf("_{\perp}\%s;_{\perp}bp[\%x]=\%d\n", reversed?"NG":"OK", m,
              bp\_table[m] - ((bp\_table[m] \& bp\_npower) \ll 1));
     cool \rightarrow x.o.h = m;
```

This code is used in section 75.

**153.** The calculations in the previous sections need several precomputed constants, depending on the parameters a, b, c, and n.

```
 \begin{array}{l} \langle \text{Initialize everything } 22 \rangle + \equiv \\ bp\_amask = ((1 \ll bp\_a) - 1) \ll 2; \qquad /* \text{ least } a \text{ bits of instruction address } */ \\ bp\_cmask = ((1 \ll bp\_c) - 1) \ll (bp\_a + 2); \qquad /* \text{ the next } c \text{ address bits } */ \\ bp\_bcmask = (1 \ll (bp\_b + bp\_c)) - 1; \qquad /* \text{ least } b + c \text{ bits of history info } */ \\ bp\_nmask = (1 \ll bp\_n) - 1; \qquad /* \text{ least significant } n \text{ bits } */ \\ bp\_npower = 1 \ll (bp\_n - 1); \qquad /* 2^{n-1}, \text{ the sign bit of an } n\text{-bit number } */ \\ \textbf{154.} \qquad \langle \text{Global variables } 20 \rangle + \equiv \\ \textbf{int } bp\_amask, \ bp\_cmask, \ bp\_bcmask, \ bp\_nmask, \ bp\_npower; \\ \textbf{int } bp\_rev\_stat, \ bp\_ok\_stat; \qquad /* \text{ how often we overrode and agreed } */ \\ \textbf{int } bp\_bad\_stat, \ bp\_good\_stat; \qquad /* \text{ how often we failed and succeeded } */ \\ \end{array}
```

**155.** After a branch or probable branch instruction has been issued and the value of the relevant register has been computed in the reorder buffer as *data-b.o*, we're ready to determine if the prediction was correct or not.

```
⟨ Cases for stage 1 execution 155⟩ ≡ case br: case pbr: j = register\_truth(data \neg b.o, data \neg op); if (j) data \neg go.o = data \neg z.o; else data \neg go.o = data \neg y.o; if (j \equiv (data \neg i \equiv pbr)) bp\_good\_stat ++; else { /* oops, misprediction */ bp\_bad\_stat ++; ⟨ Recover from incorrect branch prediction 160⟩; } goto fin\_ex; See also sections 313, 325, 327, 328, 329, 331, and 356. This code is used in section 132.
```

```
b: spec, §44.
                                   h: tetra, §17.
                                                                      pbr = 70, \S 49.
bp_a: int, §150.
                                   head: fetch *, §69.
                                                                      peek_hist: unsigned int, §99.
bp\_b: int, §150.
                                                                      predicted: register int, §85.
                                   i: internal_opcode, §44.
bp\_c: int, §150.
                                   j: register int, §12.
                                                                      print_octa: static void (), §19.
bp_{-}n: int, \S 150.
                                   l: tetra, §17.
                                                                      printf: int (), <stdio.h>.
bp\_table: char *, §150.
                                   loc: octa, §68.
                                                                      register_truth: static int (),
br = 69, \S 49.
                                   loc: octa, §44.
                                                                        §157.
cool: control *, §60.
                                   m: register int, §12.
                                                                      show\_pred\_bit = 1 \ll 7, \S 8.
cool_hist: unsigned int, §99.
                                   o: octa, §40.
                                                                      verbose: int. §4.
data: register control *,
                                   op: register mmix_opcode,
                                                                      x: specnode, §44.
  §124.
                                                                      y: spec, §44.
                                   op: mmix_opcode, §44.
fin_ex: label, §144.
                                                                      z: spec, \S44.
qo: specnode, §44.
```

**156.** The *register\_truth* subroutine is used by B, PB, CS, and ZS commands to decide whether an octabyte satisfies the conditions of the opcode, *data-op*.

```
\langle Internal prototypes 13 \rangle + \equiv
  static int register_truth ARGS((octa, mmix_opcode));
157. \langle Subroutines 14\rangle + \equiv
  static int register_truth(o, op)
        octa o:
        mmix_opcode op;
  \{ \text{ register int } b; 
     switch ((op \gg 1) \& ^{\#}3) {
     case 0: b = o.h \gg 31; break;
                                          /* negative? */
     case 1: b = (o.h \equiv 0 \land o.l \equiv 0); break;
                                                     /* zero? */
     case 2: b = (o.h < sign\_bit \land (o.h \lor o.l)); break;
                                                                /* positive? */
     case 3: b = o.l \& #1; break;
                                         /* odd? */
     if (op \& #8) return b \oplus 1:
     else return b;
  }
```

**158.** The *issued\_between* subroutine determines how many speculative instructions were issued between a given control block in the reorder buffer and the current *cool* pointer, when cc = cool.

160. If more than one functional unit is able to process branch instructions and if two of them simultaneously discover misprediction, or if misprediction is detected by one unit just as another unit is generating an interrupt, we assume that an arbitration takes place so that only the hottest one actually deissues the cooler instructions.

Changes to the  $bp\_table$  aren't undone when they were made on speculation in an instruction being deissued; nor do we worry about cases where the same  $bp\_table$  entry is being updated by two or more active coroutines. After all, the  $bp\_table$  is just a heuristic, not part of the real computation. We correct the  $bp\_table$  only if we discover that a prediction was wrong, so that we will be less likely to make the same mistake later.

```
\langle \text{Recover from incorrect branch prediction } 160 \rangle \equiv i = issued\_between(data, cool);
if (i < deissues) goto die;
deissues = i;
old\_tail = tail = head; resuminq = 0; /* clear the fetch buffer */
```

```
(Restart the fetch coroutine 287);
   inst\_ptr.o = data \neg go.o, inst\_ptr.p = \Lambda;
   if (\neg(data \neg loc.h \& sign\_bit)) {
      if (inst\_ptr.o.h \& sign\_bit) data \neg interrupt |= P\_BIT;
      else data \rightarrow interrupt \&= \sim P_BIT;
   if (bp_table) {
      bp\_table[data \rightarrow x.o.h] = data \rightarrow x.o.l;
                                                    /* this is what we should have stored */
      if (verbose & show_pred_bit) {
         printf("\_mispredicted\_"); print\_octa(data\rightarrowledge);
         printf("; | bp[%x] = %d\n", data \rightarrow x.o.h, data \rightarrow x.o.l - ((data \rightarrow x.o.l \& bp\_npower) \ll 1));
      }
   }
   cool\_hist = (j? (data \rightarrow hist \ll 1) + 1: data \rightarrow hist \ll 1);
This code is used in section 155.
        \langle \text{External prototypes } 9 \rangle + \equiv
   Extern void print_stats ARGS((void));
       \langle \text{External routines } 10 \rangle + \equiv
   void print_stats()
      register int i:
      if (bp_table)
         printf("Predictions: |, %d, in, agreement, |, %d, in, opposition; |, %d, good, |, %d, bad\n",
               bp_ok_stat, bp_rev_stat, bp_good_stat, bp_bad_stat);
      else printf("Predictions: \"\d_\good_\,\"\d_\bad\n\", bp_qood_\stat, bp_bad_\stat);
      printf("Instructions_issued_per_cycle:\n");
      for (j = 0; j \leq dispatch\_max; j++) printf ("____\%d____\%d\n", j, dispatch_stat[j]);
```

```
ARGS = macro, \S 6.
                                    dispatch\_stat: int *, §66.
                                                                        old\_tail: fetch *, §70.
bp\_bad\_stat: int, §154.
                                    Extern = macro, \S 4.
                                                                        op: mmix_opcode, §44.
bp\_good\_stat: int, §154.
                                    go: specnode, §44.
                                                                        p: specnode *, §40.
bp\_npower: int, §154.
                                    h: tetra, §17.
                                                                        P_BIT = 1 \ll 0, \S 54.
                                    head: fetch *, §69.
bp\_ok\_stat: int, §154.
                                                                        print_octa: static void (), §19.
bp\_rev\_stat: int, §154.
                                    hist: unsigned int, §44.
                                                                        printf: int (), <stdio.h>.
bp\_table: char *, §150.
                                    i: register int, §12.
                                                                        reorder_bot: control *, §60.
control = struct, §44.
                                    inst\_ptr: \mathbf{spec}, \S 284.
                                                                        reorder_top: control *, §60.
cool: control *, §60.
                                    interrupt: unsigned int, §44.
                                                                        resuming: int, §78.
cool_hist: unsigned int, §99.
                                    j: register int, §12.
                                                                        show\_pred\_bit = 1 \ll 7, \S 8.
data: register control *,
                                    l: tetra, §17.
                                                                        sign\_bit = macro, \S 80.
  §124.
                                    loc: octa, §44.
                                                                        tail: fetch *, §69.
deissues: int, §60.
                                    mmix\_opcode = enum, §47.
                                                                        verbose: int, §4.
die: label, §144.
                                    o: octa, §40.
                                                                        x: specnode, §44.
dispatch\_max: int, §59.
                                   octa = struct, \S 17.
```

163. Cache memory. It's time now to consider MMIX's MMU, the memory management unit. This part of the machine deals with the critical problem of getting data to and from the computational units. In a RISC architecture all interaction between main memory and the computer registers is specified by load and store instructions; thus memory accesses are much easier to deal with than they would be on a machine with more complex kinds of interaction. But memory management is still difficult, if we want to do it well, because main memory typically operates at a much slower speed than the registers do. High-speed implementations of MMIX introduce intermediate "caches" of storage in order to keep the most important data accessible, and cache maintenance can be complicated when all the details are taken into account. (See, for example, Chapter 5 of Hennessy and Patterson's Computer Architecture, second edition.)

This simulator can be configured to have up to three auxiliary caches between registers and memory: An I-cache for instructions, a D-cache for data, and an S-cache for both instructions and data. The S-cache, also called a *secondary cache*, is supported only if both I-cache and D-cache are present. Arbitrary access times for each cache can be specified independently; we might assume, for example, that data items in the I-cache or D-cache can be sent to a register in one or two clock cycles, but the access time for the S-cache might be say 5 cycles, and main memory might require 20 cycles or more. Our speculative pipeline can have many functional units handling load and store instructions, but only one load or store instruction can be updating the D-cache or S-cache or main memory at a time. (However, the D-cache can have several read ports; furthermore, data might be passing between the S-cache and memory while other data is passing between the reorder buffer and the D-cache.)

Besides the optional I-cache, D-cache, and S-cache, there are required caches called the IT-cache and DT-cache, for translation of virtual addresses to physical addresses. A translation cache is often called a "translation lookaside buffer" or TLB; but we call it a cache since it is implemented in nearly the same way as an I-cache.

**164.** Consider a cache that has blocks of  $2^b$  bytes each and associativity  $2^a$ ; here  $b \ge 3$  and  $a \ge 0$ . The I-cache, D-cache, and S-cache are addressed by 48-bit physical addresses, as if they were part of main memory; but the IT and DT caches are addressed by 64-bit keys, obtained from a virtual address by blanking out the lower s bits and inserting the value of n, where the page size s and the process number n are found in rV. We will consider all caches to be addressed by 64-bit keys, so that both cases are handled with the same basic methods.

Given a 64-bit key, we ignore the low-order b bits and use the next c bits to address the cache set; then the remaining 64-b-c bits should match one of  $2^a$  tags in that set. The case a=0 corresponds to a so-called direct-mapped cache; the case c=0 corresponds to a so-called fully associative cache. With  $2^c$  sets of  $2^a$  blocks each, and  $2^b$  bytes per block, the cache contains  $2^{a+b+c}$  bytes of data, in addition to the space needed for tags. Translation caches have b=3 and they also usually have c=0.

If a tag matches the specified bits, we "hit" in the cache and can use and/or update the data found there. Otherwise we "miss," and we probably want to replace one of the cache blocks by the block containing the item sought. The item chosen

for replacement is called a *victim*. The choice of victim is forced when the cache is direct-mapped, but four strategies for victim selection are available when we must choose from among  $2^a$  entries for a > 0:

- ullet "Random" selection chooses the victim by extracting the least significant a bits of the clock.
- "Serial" selection chooses  $0, 1, \ldots, 2^a 1, 0, 1, \ldots, 2^a 1, 0, \ldots$  on successive trials.
- "LRU (Least Recently Used)" selection chooses the victim that ranks last if items are ranked inversely to the time that has elapsed since their previous use.
- "Pseudo-LRU" selection chooses the victim by a rough approximation to LRU that is simpler to implement in hardware. It requires a bit table  $r_1 ldots r_{2^a-1}$ . Whenever we use an item with binary address  $(i_1 ldots i_a)_2$  in the set, we adjust the bit table as follows:

$$r_1 \leftarrow 1 - i_1, \quad r_{1i_1} \leftarrow 1 - i_2, \quad \dots, \quad r_{1i_1 \dots i_{a-1}} \leftarrow 1 - i_a;$$

here the subscripts on r are binary numbers. (For example, when a=3, the use of element  $(010)_2$  sets  $r_1 \leftarrow 1$ ,  $r_{10} \leftarrow 0$ ,  $r_{101} \leftarrow 1$ , where  $r_{101}$  means the same as  $r_5$ .) To select a victim, we start with  $l \leftarrow 1$  and then repeatedly set  $l \leftarrow 2l + r_l$ , a times; then we choose element  $l-2^a$ . When a=1, this scheme is equivalent to LRU. When a=2, this scheme was implemented in the Intel 80486 chip.

```
⟨ Type definitions 11 ⟩ +≡
typedef enum {
  random, serial, pseudo_lru, lru
} replace_policy;
```

165. A cache might also include a "victim" area, which contains the last  $2^v$  victim blocks removed from the main cache area. The victim area can be searched in parallel with the specified cache set, thereby increasing the chance of a hit without making the search go slower. Each of the three replacement policies can be used also in the victim cache.

**166.** A cache also has a granularity  $2^g$ , where  $b \ge g \ge 3$ . This means that we maintain, for each cache block, a set of  $2^{b-g}$  "dirty bits," which identify the  $2^g$ -byte groups that have possibly changed since they were last read from memory. Thus if g = b, an entire cache block is either dirty or clean; if g = 3, the dirtiness of each octabyte is maintained separately.

Two policies are available when new data is written into all or part of a cache block. We can write-through, meaning that we send all new data to memory immediately and never mark anything dirty; or we can write-back, meaning that we update the memory from the cache only when absolutely necessary. Furthermore we can write-allocate, meaning that we keep the new data in the cache, even if the cache block being written has to be fetched first because of a miss; or we can write-around, meaning that we keep the new data only if it was part of an existing cache block.

(In this discussion, "memory" is shorthand for "the next level of the memory hierarchy"; if there is an S-cache, the I-cache and D-cache write new data to the S-cache, not directly to memory. The I-cache, IT-cache, and DT-cache are read-only, so they do not need the facilities discussed in this section. Moreover, the D-cache and S-cache can be assumed to have the same granularity.)

```
⟨ Header definitions 6 ⟩ +≡
#define WRITE_BACK 1 /* use this if not write-through */
#define WRITE_ALLOC 2 /* use this if not write-around */
```

**167.** We have seen that many flavors of cache can be simulated. They are represented by **cache** structures, containing arrays of **cacheset** structures that contain arrays of **cacheblock** structures for the individual blocks. We use a full byte to store each *dirty* bit, and we use full integer words to store *rank* fields for LRU processing, etc.; memory economy is less important than simplicity in this simulator.

```
\langle \text{Type definitions } 11 \rangle + \equiv
  typedef struct {
                   /* bits of key not included in the cache block address */
    octa taq;
                      /* array of 2^{g-b} dirty bits, one per granule */
    char * dirtu:
                      /* array of 2^{b-3} octabytes, the data in a cache block */
    octa * data:
    int rank;
                   /* auxiliary information for non-random policies */
  } cacheblock;
  typedef cacheblock *cacheset;
                                         /* array of 2^a or 2^v blocks */
  typedef struct {
    int a, b, c, g, v;
       /* lg of associativity, blocksize, setsize, granularity, and victimsize */
    int aa, bb, cc, gg, vv;
       /* associativity, blocksize, setsize, granularity, and victimsize (all powers of 2) */
                      /* -2^{b+c} */
    int tagmask;
                                     /* how to choose victims and victim-victims */
    replace_policy repl, vrepl;
                    /* optional WRITE_BACK and/or WRITE_ALLOC */
    int mode;
                          /* cycles to know if there's a hit */
    int access_time;
    int copy_in_time;
                           /* cycles to copy a new block into the cache */
    int copy_out_time;
                           /* cycles to copy an old block from the cache */
    cacheset *set:
                         /* array of 2^c sets of arrays of cache blocks */
                          /* the victim cache, if present */
    cacheset victim;
```

```
/* a coroutine for copying new blocks into the cache */
    coroutine filler;
    control filler_ctl;
                           /* its control block */
                            /* a coroutine for writing dirty old data from the cache */
    coroutine flusher;
                           /* its control block */
    control flusher_ctl;
    cacheblock inbuf;
                            /* filling comes from here */
                             /* flushing goes to here */
    cacheblock outbuf;
    lockvar lock;
                        /* nonzero when the cache is being changed significantly */
    lockvar fill_lock;
                          /* nonzero when filler should pass data back */
    int ports;
                   /* how many coroutines can be reading the cache? */
    coroutine *reader:
       /* array of coroutines that might be reading simultaneously */
    char *name; /* "Icache", for example */
  } cache:
168. \langle \text{External variables 4} \rangle + \equiv
  Extern cache *Icache, *Dcache, *Scache, *ITcache, *DTcache;
       Now we are ready to define some basic subroutines for cache maintenance.
Let's begin with a trivial routine that tests if a given cache block is dirty.
\langle \text{Internal prototypes } 13 \rangle + \equiv
  static bool is_dirty ARGS((cache *, cacheblock *));
170.
     \langle \text{Subroutines } 14 \rangle + \equiv
  static bool is\_dirty(c, p)
                   /* the cache containing it */
       cache *c;
       cacheblock *p; /* a cache block */
    register int j;
    register char *d = p \rightarrow dirty;
    for (j = 0; j < c \rightarrow bb; d + +, j + = c \rightarrow gg)
       if (*d) return true;
    return false;
  }
```

```
171.
         For diagnostic purposes we might want to display an entire cache block.
\langle Internal prototypes 13 \rangle + \equiv
  static void print_cache_block ARGS((cacheblock, cache *));
172.
         \langle Subroutines 14\rangle + \equiv
  static void print_cache_block(p, c)
         cacheblock p:
         cache *c:
   { register int i, j, b = c \rightarrow bb \gg 3, g = c \rightarrow gg \gg 3;
      printf("\%08x\%08x:_{!}", p.tag.h, p.tag.l);
     for (i = j = 0; j < b; j++, i+= ((j & (g-1))? 0:1))
         printf("\%08x\%08x\%c", p.data[j].h, p.data[j].l, p.dirty[i]? '*' : '_\');
     printf("_{\sqcup}(%d)\n", p.rank);
   }
         \langle Internal prototypes 13 \rangle + \equiv
  static void print_cache_locks ARGS((cache *));
        \langle \text{Subroutines } 14 \rangle + \equiv
  static void print_cache_locks(c)
        cache *c:
   {
     if (c) {
        if (c \rightarrow lock) printf("%s_locked_by_%s:%d\n", c \rightarrow name, c \rightarrow lock \rightarrow name, c \rightarrow lock \rightarrow stage);
         if (c→fill_lock)
            printf("\%sfill_llocked_lby_l\%s: \%d\n", c \rightarrow name, c \rightarrow fill_lock \rightarrow name, c \rightarrow fill_lock \rightarrow stage);
   }
         The print_cache routine prints the entire contents of a cache. This can be a
huge amount of data, but it can be very useful when debugging. Fortunately, the task
of debugging favors the use of small caches, since interesting cases arise more often
when a cache is fairly small.
\langle \text{External prototypes } 9 \rangle + \equiv
  Extern void print_cache ARGS((cache *, bool));
         \langle \text{External routines } 10 \rangle + \equiv
  void print_cache(c, dirty_only)
         cache *c:
         bool dirty_only;
   {
     if (c) { register int i, j;
         printf("%suofu%s:", dirty_only? "Dirtyublocks": "Contents", c→name);
        if (c \rightarrow filler.next) {
            printf("□(filling□");
            print\_octa(c \rightarrow name[1] \equiv T, ? c \rightarrow filler\_ctl.y.o : c \rightarrow filler\_ctl.z.o);
            printf (")");
         if (c \rightarrow flusher.next) {
            printf("||(flushing||");
```

```
\begin{array}{c} print\_octa(c \rightarrow outbuf.tag); \\ printf(")"); \\ \\ \\ printf("\n"); \\ \\ \langle Print \ all \ of \ c's \ cache \ blocks \ 177 \rangle; \\ \\ \\ \\ \end{array} \}
```

177. We don't print the cache blocks that have an invalid tag, unless requested to be verbose.

```
 \langle \operatorname{Print \ all \ of \ } c's \ \operatorname{cache \ blocks \ } 177 \rangle \equiv \\ \operatorname{for \ } (i=0; \ i < c^{\rightarrow}cc; \ i++) \\ \operatorname{for \ } (j=0; \ j < c^{\rightarrow}aa; \ j++) \\ \operatorname{if \ } ((\neg(c^{\rightarrow}set[i][j].tag.h \ \& \ sign\_bit) \lor (verbose \ \& \ show\_wholecache\_bit)) \land \\ (\neg dirty\_only \lor is\_dirty(c, \&c^{\rightarrow}set[i][j]))) \ \ \{ \\ printf (" [\%d] [\%d]_{\sqcup} ", i, j); \\ print\_cache\_block(c^{\rightarrow}set[i][j], c); \\ \} \\ \operatorname{for \ } (j=0; \ j < c^{\rightarrow}vv; \ j++) \\ \operatorname{if \ } ((\neg(c^{\rightarrow}victim[j].tag.h \ \& \ sign\_bit) \lor (verbose \ \& \ show\_wholecache\_bit)) \land \\ (\neg dirty\_only \lor is\_dirty(c, \&c^{\rightarrow}victim[j]))) \ \ \{ \\ printf (" V [\%d]_{\sqcup} ", j); \\ print\_cache\_block(c^{\rightarrow}victim[j], c); \\ \}
```

This code is used in section 176.

```
aa: int, §167.
                                   flusher: coroutine, §167.
                                                                       rank: int, §167.
ARGS = macro, \S 6.
                                    gg: int, §167.
                                                                       set: \mathbf{cacheset} *, §167.
bb: int, §167.
                                    h: tetra, §17.
                                                                       show\_wholecache\_bit = 1 \ll 8,
                                   is_dirty: static bool (), §170.
bool = enum, \S 11.
                                                                         §8.
cache = struct, §167.
                                   l: tetra, §17.
                                                                       sign\_bit = macro, \S 80.
cacheblock = struct, \S 167.
                                   lock: lockvar, §167.
                                                                       stage: int, \S 23.
                                   name: char *, §167.
cc: int, §167.
                                                                       tag: octa, §167.
data: octa *, §167.
                                   next: coroutine *, §23.
                                                                       verbose: int, §4.
dirty: char *, §167.
                                   o: octa, §40.
                                                                       victim: cacheset, §167.
                                   outbuf: cacheblock, §167.
Extern = macro, \S 4.
                                                                       vv: int, \S 167.
fill_lock: lockvar, §167.
                                   print_octa: static void (), §19. y: spec, §44.
filler: coroutine, §167.
                                   printf: int (), <stdio.h>.
                                                                       z: spec, §44.
filler_ctl: control, §167.
```

178. The *clean\_block* routine simply initializes a given cache block.

```
 \begin{array}{ll} \langle \text{ External prototypes } 9 \rangle + \equiv \\ & \text{ Extern void } clean\_block \text{ ARGS}((\textbf{cache} *, \textbf{cacheblock} *)); \\ \textbf{179.} & \langle \text{ External routines } 10 \rangle + \equiv \\ & \text{ void } clean\_block(c,p) \\ & \text{ cache } *c; \\ & \text{ cacheblock } *p; \\ \{ \\ & \text{ register int } j; \\ & p \rightarrow tag.h = sign\_bit, p \rightarrow tag.l = 0; \\ & \text{ for } (j=0; \ j < c \rightarrow bb \gg 3; \ j++) \ p \rightarrow data[j] = zero\_octa; \\ & \text{ for } (j=0; \ j < c \rightarrow bb \gg c \rightarrow g; \ j++) \ p \rightarrow dirty[j] = false; \\ \} \end{array}
```

**180.** The *zap\_cache* routine invalidates all tags of a given cache, effectively restoring it to its initial condition.

```
⟨External prototypes 9⟩ +≡
Extern void zap_cache ARGS((cache *));
```

**181.** We clear the *dirty* entries here, just to be tidy, although they could actually be left in arbitrary condition when the tags are invalid.

```
 \begin{split} &\langle \text{ External routines } 10 \, \rangle + \equiv \\ & \textbf{void } zap\_cache(c) \\ & \textbf{cache } *c; \\ &\{ \\ & \textbf{register int } i, \ j; \\ & \textbf{for } (i=0; \ i < c \neg cc; \ i++) \\ & \textbf{for } (j=0; \ j < c \neg aa; \ j++) \ \{ \\ & clean\_block(c, \&(c \neg set[i][j])); \\ &\} \\ & \textbf{for } (j=0; \ j < c \neg vv; \ j++) \ \{ \\ & clean\_block(c, \&(c \neg victim[j])); \\ &\} \\ &\} \\ \end{aligned}
```

**182.** The *get\_reader* subroutine finds the index of an available reader coroutine for a given cache, or returns a negative value if no readers are available.

```
 \langle \text{Internal prototypes } 13 \rangle + \equiv \\ \text{static int } \textit{get\_reader } \texttt{ARGS}((\textbf{cache }*)); \\ \textbf{183.} \quad \langle \text{Subroutines } 14 \rangle + \equiv \\ \text{static int } \textit{get\_reader}(c) \\ \text{cache } *c; \\ \{ \text{ register int } j; \\ \text{for } (j=0; \ j < c \text{-ports}; \ j \text{++}) \\ \text{if } (c \text{-reader}[j].next \equiv \Lambda) \text{ return } j; \\ \text{return } -1; \\ \}
```

**184.** The subroutine  $copy\_block(c, p, cc, pp)$  copies the dirty items from block p of cache c into block pp of cache cc, assuming that the destination cache has a sufficiently large block size. (In other words, we assume that  $cc \neg b \geq c \neg b$ .) We also assume that both blocks have compatible tags, and that both caches have the same granularity.

```
\langle Internal prototypes 13 \rangle + \equiv
   static void copy_block ARGS((cache *, cacheblock *, cache *, cacheblock *));
        \langle \text{Subroutines } 14 \rangle + \equiv
   static void copy\_block(c, p, cc, pp)
          cache *c, *cc;
          cacheblock *p, *pp;
      register int j, jj, i, ii, lim;
      register int off = p \rightarrow taq.l \& (cc \rightarrow bb - 1);
      if (c \rightarrow g \neq cc \rightarrow g \lor p \rightarrow tag.h \neq pp \rightarrow tag.h \lor p \rightarrow tag.l - off \neq pp \rightarrow tag.l)
          panic(confusion("copy_block"));
      for (j = 0, jj = off \gg c \rightarrow g; j < c \rightarrow bb \gg c \rightarrow g; j++, jj++)
          if (p \rightarrow dirty[j]) {
              pp \rightarrow dirty[jj] = true;
             for (i = j \ll (c - g - 3), ii = jj \ll (c - g - 3), lim = (j + 1) \ll (c - g - 3); i < lim;
                        i++, ii++) pp \rightarrow data[ii] = p \rightarrow data[i];
          }
   }
```

```
 \begin{array}{l} aa: \ \mathbf{int}, \ \S 167. \\ \mathbf{ARGS} = \mathrm{macro}, \ \S 6. \\ b: \ \mathbf{int}, \ \S 167. \\ bb: \ \mathbf{int}, \ \S 167. \\ \mathbf{cache} = \mathbf{struct}, \ \S 167. \\ \mathbf{cacheblock} = \mathbf{struct}, \ \S 167. \\ \mathbf{cc: int}, \ \S 167. \\ confusion = \mathrm{macro} \ (\ ), \ \S 13. \\ data: \ \mathbf{octa} \ *, \ \S 167. \\ \end{array}
```

```
dirty: char *, §167.

Extern = macro, §4.

false = 0, §11.

g: int, §167.

h: tetra, §17.

l: tetra, §17.

next: coroutine *, §23.

panic = macro (), §13.

ports: int, §167.
```

reader: coroutine \*, §167. set: cacheset \*, §167. sign\_bit = macro, §80. tag: octa, §167. true = 1, §11. victim: cacheset, §167. vv: int, §167. zero\_octa: octa, MMIX-ARITH §4. 186. The *choose\_victim* subroutine selects the victim to be replaced when we need to change a cache set. We need only one bit of the rank fields to implement the r table when  $policy = pseudo\_lru$ , and we don't need rank at all when policy = random. Of course we use an a-bit counter to implement policy = serial. In the other case, policy = lru, we need an a-bit rank field; the least recently used entry has rank 0, and the most recently used entry has rank  $2^a - 1 = aa - 1$ .

```
\langle \text{Internal prototypes } 13 \rangle + \equiv
  static cacheblock *choose_victim ARGS((cacheset.int,replace_policy));
       \langle Subroutines 14\rangle + \equiv
  static cacheblock *choose\_victim(s, aa, policy)
        cacheset s:
                    /* setsize */
        int aa:
       replace_policy policy;
     register cacheblock *p;
     register int l, m;
     switch (policy) {
     case random: return &s[ticks.l & (aa - 1)];
     case serial: l = s[0].rank; s[0].rank = (l+1) \& (aa-1); return &s[l];
     case lru:
       for (p = s; p < s + aa; p ++)
          if (p \rightarrow rank \equiv 0) return p;
        panic(confusion("lru|victim"));
                                                /* what happened? nobody has rank zero */
     case pseudo_lru:
        for (l = 1, m = aa \gg 1; m; m \gg = 1) l = l + l + s[l].rank;
        return \&s[l-aa];
     }
  }
        The note_usage subroutine updates the rank entries to record the fact that a
particular block in a cache set is now being used.
\langle \text{Internal prototypes } 13 \rangle + \equiv
  static void note_usage ARGS((cacheblock *, cacheset, int, replace_policy));
       \langle Subroutines 14\rangle + \equiv
  static void note\_usage(l, s, aa, policy)
                             /* a cache block that's probably worth preserving */
        cacheblock *l:
        cacheset s:
                          /* the set that contains l */
                    /* setsize */
        int aa:
        replace_policy policy;
     register cacheblock *p;
     register int j, m, r;
     if (aa \equiv 1 \lor policy \leq serial) return;
     if (policy \equiv lru) {
       r = l \rightarrow rank:
       for (p = s; p < s + aa; p++)
          if (p \rightarrow rank > r) p \rightarrow rank --;
```

```
l \rightarrow rank = aa - 1;
                /* policy \equiv pseudo\_lru */
     else {
       r = l - s:
        for (j = 1, m = aa \gg 1; m; m \gg = 1)
          if (r \& m) \ s[j].rank = 0, j = j + j + 1;
          else s[j].rank = 1, j = j + j;
     }
     return;
  }
        The demote_usage subroutine is sort of the opposite of note_usage; it changes
the rank of a given block to least recently used.
\langle \text{Internal prototypes } 13 \rangle + \equiv
  static void demote_usage ARGS((cacheblock *, cacheset, int, replace_policy));
      \langle \text{Subroutines } 14 \rangle + \equiv
  static void demote\_usage(l, s, aa, policy)
        cacheblock *l;
                            /* a cache block we probably don't need */
                          /* the set that contains l */
        cacheset s:
                     /* setsize */
        int aa:
        replace_policy policy;
     register cacheblock *p;
     register int j, m, r;
     if (aa \equiv 1 \lor policy < serial) return;
     if (policy \equiv lru) {
        r = l \rightarrow rank;
        for (p = s; p < s + aa; p +++)
          if (p \rightarrow rank < r) p \rightarrow rank ++;
        l \rightarrow rank = 0:
     }
     else { /* policy \equiv pseudo\_lru */
       r = l - s;
        for (j = 1, m = aa \gg 1; m; m \gg = 1)
          if (r \& m) s[j].rank = 1, j = j + j + 1;
          else s[j].rank = 0, j = j + j;
     }
     return;
```

**if**  $(\neg c \neg filler.next)$  {

note\_usage  $(q, s, c \rightarrow aa, c \rightarrow repl)$ ;  $\langle \text{Swap cache blocks } p \text{ and } q \text{ 197} \rangle$ ;

register cacheset  $s = cache\_addr(c, p \rightarrow tag)$ ;

**register cacheblock**  $*q = choose\_victim(s, c \rightarrow aa, c \rightarrow repl);$ 

192. The cache\_search routine looks for a given key  $\alpha$  in a given cache, and returns a cache block if there's a hit; otherwise it returns  $\Lambda$ . If the search hits, the set in which the block was found is stored in global variable  $hit\_set$ . Notice that we need to check more bits of the tag when we search in the victim area.

```
#define cache\_addr(c, alf) c \rightarrow set[(alf.l \& \sim (c \rightarrow tagmask)) \gg c \rightarrow b]
\langle Internal prototypes 13 \rangle + \equiv
  static cacheblock *cache_search ARGS((cache *, octa));
193.
         \langle Subroutines 14\rangle + \equiv
  static cacheblock *cache\_search(c, alf)
         cache *c:
                          /* the cache to be searched */
         octa alf:
                          /* the kev */
     register cacheset s:
     register cacheblock *p:
     s = cache\_addr(c, alf);
                                       /* the set corresponding to alf */
     for (p = s; p < s + c \rightarrow aa; p ++)
         if (((p \rightarrow taq.l \oplus alf.l) \& c \rightarrow taqmask) \equiv 0 \land p \rightarrow taq.h \equiv alf.h) goto hit;
     s = c \rightarrow victim;
     if (\neg s) return \Lambda;
                                  /* cache miss, and no victim area */
     for (p = s; p < s + c \rightarrow vv; p++)
         if (((p \rightarrow tag.l \oplus alf.l) \& (-c \rightarrow bb)) \equiv 0 \land p \rightarrow tag.h \equiv alf.h) goto hit;
     return \Lambda;
                       /* double miss */
   hit: hit\_set = s; return p;
   }
194.
         \langle Global variables 20\rangle + \equiv
  cacheset hit_set;
195. If p = cache\_search(c, alf) hits and if we call use\_and\_fix(c, p) immediately
afterwards, cache c is updated to record the usage of key alf. A hit in the victim area
moves the cache block to the main area, unless the filler routine of cache c is active.
A pointer to the (possibly moved) cache block is returned.
\langle \text{Internal prototypes } 13 \rangle + \equiv
  static cacheblock *use_and_fix ARGS((cache *, cacheblock *));
196.
         \langle \text{Subroutines } 14 \rangle + \equiv
  static cacheblock *use\_and\_fix(c, p)
        cache *c;
         cacheblock *p:
     if (hit\_set \neq c \rightarrow victim) note_usage(p, hit\_set, c \rightarrow aa, c \rightarrow repl);
     else {
         note\_usage(p, hit\_set, c \rightarrow vv, c \rightarrow vrepl);
                                                          /* found in victim cache */
```

```
\left.\begin{array}{c} \mathbf{return}\ q;\\ \\ \\ \\ \\ \\ \mathbf{return}\ p;\\ \\ \\ \end{array}\right\}
```

197. We can simply permute the pointers inside the cacheblock structures of a cache, instead of copying the data, if we are careful not to let any of those pointers escape into other data structures.

```
 \left\{ \begin{array}{l} \text{Swap cache blocks } p \text{ and } q \text{ 197} \right\rangle \equiv \\ \left\{ \begin{array}{l} \text{octa } t; \\ \text{register char } *d = p \text{-} dirty; \\ \text{register octa } *dd = p \text{-} data; \\ t = p \text{-} tag; \ p \text{-} tag = q \text{-} tag; \ q \text{-} tag = t; \\ p \text{-} dirty = q \text{-} dirty; \ q \text{-} dirty = d; \\ p \text{-} data = q \text{-} data; \ q \text{-} data = dd; \\ \end{array} \right\}
```

This code is used in sections 196 and 205.

**198.** The *demote\_and\_fix* routine is analogous to *use\_and\_fix*, except that we don't want to promote the data we found.

```
 \langle \text{Internal prototypes } 13 \rangle + \equiv \\ \text{static cacheblock } *demote\_and\_fix \ \mathsf{ARGS}((\text{cache } *, \text{cacheblock } *)); \\ \mathbf{199.} \quad \langle \text{Subroutines } 14 \rangle + \equiv \\ \text{static cacheblock } *demote\_and\_fix(c,p) \\ \text{cache } *c; \\ \text{cacheblock } *p; \\ \{ \\ \text{if } (hit\_set \neq c \text{-}victim) \ demote\_usage(p, hit\_set, c \text{-}aa, c \text{-}repl); \\ \text{else } demote\_usage(p, hit\_set, c \text{-}vv, c \text{-}vrepl); \\ \text{return } p; \\ \}
```

```
aa: int, §167.
                                   demote_usage: static void (),
                                                                        §205.
ARGS = macro, \S 6.
                                                                      q: register cacheblock *,
                                     §191.
b: int, §167.
                                   dirty: char *, §167.
bb: int, §167.
                                   filler: coroutine, \S 167.
                                                                      repl: replace_policy, §167.
cache = struct, \S 167.
                                   h: tetra, §17.
                                                                      set: \mathbf{cacheset} *, \S 167.
cacheblock = struct, \S 167.
                                   l: tetra, §17.
                                                                      tag: octa, §167.
cacheset = cacheblock *.
                                   next: coroutine *, §23.
                                                                      tagmask: int, \S 167.
  §167.
                                   note_usage: static void (),
                                                                      victim: cacheset, §167.
choose\_victim: static
                                     §189.
                                                                      vrepl: replace_policy, §167.
  cacheblock *(), §187.
                                   octa = struct, \S 17.
                                                                      vv: int, \S 167.
data: octa *, §167.
                                   p: register cacheblock *,
```

**200.** The subroutine  $load\_cache(c, p)$  is called at a moment when  $c\neg lock$  has been set and  $c\neg inbuf$  has been filled with clean data to be placed in the cache block p.

```
 \begin{array}{ll} \langle \text{Internal prototypes } 13 \rangle + \equiv \\ & \text{static void } load\_cache \ \texttt{ARGS}((\text{cache } *, \text{cacheblock } *)); \\ \\ \textbf{201.} \quad \langle \text{Subroutines } 14 \rangle + \equiv \\ & \text{static void } load\_cache(c,p) \\ & \text{cache } *c; \ \textbf{cacheblock } *p; \\ \\ \{ & \text{register int } i; \\ & \text{register octa } *d; \\ & \text{for } (i=0; \ i < c \rightarrow bb \gg c \rightarrow g; \ i++) \ p \rightarrow dirty[i] = false; \\ & d = p \rightarrow data; \ p \rightarrow data = c \rightarrow inbuf. data; \ c \rightarrow inbuf. data = d; \\ & p \rightarrow tag = c \rightarrow inbuf. tag; \\ & hit\_set = cache\_addr(c,p \rightarrow tag); \ use\_and\_fix(c,p); \ /* \ p \ \text{not moved } */ \\ \} \end{array}
```

**202.** The subroutine  $flush\_cache(c, p, keep)$  is called at a "quiet" moment when c- $flusher.next = \Lambda$ . It puts cache block p into c-outbuf and fires up the c-flusher coroutine, which will take care of sending the data to lower levels of the memory hierarchy. Cache block p is also marked clean.

```
\langle \text{Internal prototypes } 13 \rangle + \equiv
   static void flush_cache ARGS((cache *, cacheblock *, bool));
203.
           \langle Subroutines 14\rangle + \equiv
   static void flush\_cache(c, p, keep)
          cache *c:
                                       /* a block inside cache c */
          cacheblock *p;
          bool keep;
                                /* should we preserve the data in p? */
      register octa *d;
      register char *dd;
      register int j;
      c \rightarrow outbuf.tag = p \rightarrow tag;
      if (keep) for (j = 0; j < c \rightarrow bb \gg 3; j \leftrightarrow) c \rightarrow outbuf.data[j] = p \rightarrow data[j];
      else d = c \rightarrow outbuf.data, c \rightarrow outbuf.data = p \rightarrow data, p \rightarrow data = d;
       dd = c \rightarrow outbuf.dirty, c \rightarrow outbuf.dirty = p \rightarrow dirty, p \rightarrow dirty = dd;
      for (j = 0; j < c \rightarrow bb \gg c \rightarrow g; j ++) p \rightarrow dirty[j] = false;
                                                             /* will not be aborted */
       startup(\&c \rightarrow flusher, c \rightarrow copy\_out\_time);
   }
```

**204.** The *alloc\_slot* routine is called when we wish to put new information into a cache after a cache miss. It returns a pointer to a cache block in the main area where the new information should be put. The tag of that cache block is invalidated; the calling routine should take care of filling it and giving it a valid tag in due time. The cache's *filler* routine should not be active when *alloc\_slot* is called.

Inserting new information might also require writing old information into the next level of the memory hierarchy, if the block being replaced is dirty. This routine returns

 $\Lambda$  in such cases if the cache is flushing a previously discarded block. Otherwise it schedules the *flusher* coroutine.

This routine returns  $\Lambda$  also if the given key happens to be in the cache. Such cases are rare, but the following scenario shows that they aren't impossible: Suppose the DT-cache access time is 5, the D-cache access time is 1, and two processes simultaneously look for the same physical address. One process hits in DT-cache but misses in D-cache, waiting 5 cycles before trying  $alloc\_slot$  in the D-cache; meanwhile the other process missed in D-cache but didn't need to use the DT-cache, so it might have updated the D-cache.

A key value is never negative. Therefore we can invalidate the tag in the chosen slot by forcing it to be negative.

```
\langle Internal prototypes 13 \rangle + \equiv
   static cacheblock *alloc_slot ARGS((cache *, octa));
         \langle Subroutines 14\rangle + \equiv
   static cacheblock *alloc\_slot(c, alf)
         cache *c:
                            /* key that probably isn't in the cache */
         octa alf;
      register cacheset s:
      register cacheblock *p, *q;
      if (cache\_search(c, alf)) return \Lambda;
      if (c\neg flusher.next \land c\neg outbuf.tag.h \equiv alf.h \land \neg((c\neg outbuf.tag.l \oplus alf.l) \& c\neg tagmask))
         return A:
                                          /* the set corresponding to alf */
      s = cache\_addr(c, alf);
      if (c \rightarrow victim) p = choose\_victim(c \rightarrow victim, c \rightarrow vv, c \rightarrow vrepl);
      else p = choose\_victim(s, c \rightarrow aa, c \rightarrow repl);
      if (is\_dirty(c, p)) {
         if (c \rightarrow flusher.next) return \Lambda;
         flush\_cache(c, p, false);
      if (c→victim) {
         q = choose\_victim(s, c \rightarrow aa, c \rightarrow repl); \langle Swap cache blocks p and q 197 \rangle;
         q \rightarrow tag.h \mid = sign\_bit;
                                        /* invalidate the tag */
         return q:
      }
      p \rightarrow tag.h \mid = sign\_bit; return p;
```

```
aa: int, \S 167.
                                   copy\_out\_time: int, §167.
                                                                       next: coroutine *, \S 23.
ARGS = macro, \S 6.
                                   data: octa *, §167.
                                                                       octa = struct, §17.
bb: int, §167.
                                                                       outbuf: cacheblock, §167.
                                   dirty: char *, §167.
\mathbf{bool} = \mathbf{enum}, \S 11.
                                   false = 0, \S 11.
                                                                       repl: replace_policy, §167.
cache = struct, \S 167.
                                   filler: coroutine, \S 167.
                                                                       sign\_bit = macro, \S 80.
cache\_addr = macro (), §192.
                                   flusher: coroutine, §167.
                                                                       startup: static void (), §31.
cache_search: static
                                   g: int, §167.
                                                                       tag: octa, §167.
  cacheblock *(), §193.
                                   h: tetra, §17.
                                                                       tagmask: int, §167.
cacheblock = struct, §167.
                                   hit\_set: cacheset, §194.
                                                                       use\_and\_fix: static
                                   inbuf: cacheblock, §167.
                                                                         cacheblock *(), §196.
cacheset = cacheblock *,
  §167.
                                   is_dirty: static bool (), §170.
                                                                       victim: cacheset, §167.
choose_victim: static
                                   l: tetra, §17.
                                                                       vrepl: replace_policy, §167.
  cacheblock *(), §187.
                                   lock: lockvar, §167.
                                                                       vv: int, \S 167.
```

**206.** Simulated memory. How should we deal with the potentially gigantic memory of MMIX? We can't simply declare an array m that has  $2^{48}$  bytes. (Indeed, up to  $2^{63}$  bytes are needed, if we consider also the physical addresses  $\geq 2^{48}$  that are reserved for memory-mapped input/output.)

We could regard memory as a special kind of cache, in which every access is required to hit. For example, such an "M-cache" could be fully associative, with  $2^a$  blocks each having a different tag; simulation could proceed until more than  $2^a - 1$  tags are required. But then the predefined value of a might well be so large that the sequential search of our  $cache\_search$  routine would be too slow.

Instead, we will allocate memory in chunks of  $2^{16}$  bytes at a time, as needed, and we will use hashing to search for the relevant chunk whenever a physical address is given. If the address is  $2^{48}$  or greater, special routines called  $spec\_read$  and  $spec\_write$ , supplied by the user, will be called upon to do the reading or writing. Otherwise the 48-bit address consists of a 32-bit chunk address and a 16-bit chunk offset.

Chunk addresses that are not used take no space in this simulator. But if, say, 1000 such patterns occur, the simulator will dynamically allocate approximately 65MB for the portions of main memory that are used. Parameter <code>mem\_chunks\_max</code> specifies the largest number of different chunk addresses that are supported. This parameter does not constrain the range of simulated physical addresses, which cover the entire 256 large-terabyte range permitted by MMIX.

```
⟨Type definitions 11⟩ +≡

typedef struct {

tetra tag; /* 32-bit chunk address */

octa *chunk; /* either Λ or an array of 2<sup>13</sup> octabytes */
} chunknode;
```

**207.** The parameter  $hash\_prime$  should be a prime number larger than the parameter  $mem\_chunks\_max$ , preferably more than twice as large but not much bigger than that. The default values  $mem\_chunks\_max = 1000$  and  $hash\_prime = 2003$  are set by  $MMIX\_config$  unless the user specifies otherwise.

```
⟨External variables 4⟩ +≡
Extern int mem_chunks; /* this many chunks are allocated so far */
Extern int mem_chunks_max; /* up to this many different chunks per run */
Extern int hash_prime; /* larger than mem_chunks_max, but not enormous */
Extern chunknode *mem_hash; /* the simulated main memory */
```

**208.** The separately compiled procedures  $spec\_read()$  and  $spec\_write()$  have the same calling conventions as the general procedures  $mem\_read()$  and  $mem\_write()$ , but with an additional size parameter, which specifies that  $1 \ll size$  bytes should be read or written.

```
⟨Subroutines 14⟩ +≡
extern octa spec_read ARGS((octa addr, int size)); /* for memory mapped I/O */
extern void spec_write ARGS((octa addr, octa val, int size)); /* likewise */
```

Extern int *last\_h*;

**209.** If the program tries to read from a chunk that hasn't been allocated, the value zero is returned, optionally with a comment to the user.

Chunk address 0 is always allocated first. Then we can assume that a matching chunk tag implies a nonnull chunk pointer.

This routine sets *last\_h* to the chunk found, so that we can rapidly read other words that we know must belong to the same chunk. For this purpose it is convenient to let *mem\_hash[hash\_prime]* be a chunk full of zeros, representing uninitialized memory.

```
\langle \text{External prototypes } 9 \rangle + \equiv
  Extern octa mem_read ARGS((octa addr));
210. \langle External routines 10\rangle + \equiv
  octa mem_read(addr)
       octa addr;
     register tetra off, key;
     register int h;
     off = (addr.l \& #ffff) \gg 3;
     key = (addr.l \& #ffff0000) + addr.h;
     for (h = key \% hash\_prime; mem\_hash[h].tag \neq key; h--) {
       if (mem\_hash[h].chunk \equiv \Lambda) {
          if (verbose & uninit_mem_bit)
             errprint2 ("uninitialized_memory_read_at_%08x%08x", addr.h, addr.l);
          h = hash\_prime; break;
                                          /* zero will be returned */
       if (h \equiv 0) h = hash\_prime;
     last\_h = h;
     return mem_hash[h].chunk[off];
211. \langle \text{External variables 4} \rangle + \equiv
```

```
ARGS = macro, \S 6.
                                 l: tetra, §17.
                                                                      MMIX-MEM \S 2.
                                 mem_write: void (), §213.
cache_search: static
                                                                    spec_write: (), MMIX-MEM §3.
  cacheblock *(), §193.
                                  MMIX_config: void (),
                                                                    tetra = unsigned int, §17.
errprint2 = macro(), \S 13.
                                  MMIX-CONFIG §38.
                                                                    uninit\_mem\_bit = 1 \ll 4, \S 8.
Extern = macro, \S 4.
                                 octa = struct, \S 17.
                                                                    verbose: int, §4.
h: tetra, §17.
                                  spec_read: octa (),
```

/\* the hash index that was most recently correct \*/

```
212.
         \langle \text{External prototypes } 9 \rangle + \equiv
  Extern void mem_write ARGS((octa addr, octa val));
         \langle \text{External routines } 10 \rangle + \equiv
  void mem_write(addr, val)
        octa addr, val;
     register tetra off, key;
     register int h;
      off = (addr.l \& #ffff) \gg 3;
      key = (addr.l \& #ffff0000) + addr.h;
     for (h = key \% hash\_prime; mem\_hash[h].tag \neq key; h--) {
        if (mem\_hash[h].chunk \equiv \Lambda) {
           if (++ mem\_chunks > mem\_chunks\_max)
              panic(errprint1("More, than, %d, memory, chunks, are, needed",
                    mem\_chunks\_max));
           mem\_hash[h].chunk = (\mathbf{octa} *) \ calloc(1 \ll 13, \mathbf{sizeof}(\mathbf{octa}));
           if (mem\_hash[h].chunk \equiv \Lambda)
              panic(errprint1("I_{\sqcup}can't_{\sqcup}allocate_{\sqcup}memory_{\sqcup}chunk_{\sqcup}number_{\sqcup}%d", mem\_chunks));
           mem\_hash[h].tag = key;
           break:
        if (h \equiv 0) h = hash\_prime;
     last\_h = h;
     mem\_hash[h].chunk[off] = val;
   }
```

**214.** The memory is characterized by several parameters, depending on the characteristics of the memory bus being simulated. Let  $bus\_words$  be the number of octabytes read or written simultaneously (usually  $bus\_words$  is 1 or 2; it must be a power of 2). The number of clock cycles needed to read or write  $c*bus\_words$  octabytes that all belong to the same cache block is assumed to be  $mem\_addr\_time + c*mem\_read\_time$  or  $mem\_addr\_time + c*mem\_write\_time$ , respectively.

```
⟨ External variables 4⟩ +≡
Extern int mem_addr_time; /* cycles to transmit an address on memory bus */
Extern int bus_words; /* width of memory bus, in octabytes */
Extern int mem_read_time; /* cycles to read from main memory */
Extern int mem_write_time; /* cycles to write to main memory */
Extern lockvar mem_lock; /* is nonnull when the bus is busy */
```

**215.** One of the principal ways to write memory is to invoke a *flush\_to\_mem* coroutine, which is the *Scache¬flusher* if there is an S-cache, or the *Dcache¬flusher* if there is a D-cache but no S-cache.

When such a coroutine is started, its  $data \neg ptr\_a$  will be Scache or Dcache. The data to be written will just have been copied to the cache's outbuf.

terminate: label, §125.

 $wait = macro(), \S 125.$ 

tetra = unsigned int, §17.

 $\langle$  Cases for control of special coroutines 126 $\rangle + \equiv$ 

 $errprint1 = macro(), \S 13.$ 

 $flush\_to\_mem = 97, \S 129.$ 

flusher: coroutine, §167.

Extern =  $macro, \S 4$ .

```
case flush_to_mem:
  { register cache *c = (cache *) data \neg ptr\_a;
     switch (data→state) {
     case 0: if (mem\_lock) wait(1);
        data \rightarrow state = 1:
     case 1: set\_lock(self, mem\_lock);
        data \rightarrow state = 2;
        \langle Write the dirty data of c-outbuf and wait for the bus 216\rangle;
     case 2: goto terminate; /* this frees mem_lock and c→outbuf */
  }
216.
      (Write the dirty data of c\rightarrow outbuf and wait for the bus 216)
     register int off, last_off, count, first, ii;
     register int del = c \rightarrow qq \gg 3; /* octabytes per granule */
     octa addr:
     addr = c \rightarrow outbuf.tag; off = (addr.l \& #ffff) \gg 3;
     for (i = j = 0, first = 1, count = 0; j < c \rightarrow bb \gg c \rightarrow g; j ++) {
        ii = i + del;
        if (\neg c \neg outbuf.dirty[j]) i = ii, off += del, addr.l += del \ll 3;
        else while (i < ii) {
             if (first) {
                count ++; last\_off = off; first = 0;
                mem\_write(addr, c \rightarrow outbuf.data[i]);
              } else {
                if ((off \oplus last\_off) \& (-bus\_words)) count ++;
                last\_off = off;
                mem\_hash[last\_h].chunk[off] = c \rightarrow outbuf.data[i];
              i++; off++; addr.l+=8;
     }
     wait(mem\_addr\_time + count * mem\_write\_time);
This code is used in section 215.
ARGS = macro, \S 6.
                                   g: int, §167.
                                                                       outbuf: cacheblock, §167.
                                   gg: int, §167.
bb: int, §167.
                                                                       panic = macro(), \S 13.
cache = struct, \S 167.
                                   h: tetra, §17.
                                                                       ptr_a: void *, §44.
calloc: void *(), <stdlib.h>.
                                   hash_prime: int, §207.
                                                                       Scache: cache *, §168.
                                   i: register int, §12.
chunk: octa *, §206.
                                                                      self: register coroutine *,
data: register control *,
                                   j: register int, §12.
                                                                         §124.
  §124.
                                   l: tetra, §17.
                                                                       set\_lock = macro(), \S 37.
                                   last\_h: int, §211.
data: octa *, §167.
                                                                       state: int, §44.
Dcache: cache *, §168.
                                   lockvar = coroutine *, §37.
                                                                       taq: tetra, §206.
dirty: char *, §167.
                                   mem\_chunks: int, §207.
                                                                       tag: octa, §167.
```

 $mem\_chunks\_max$ : int, §207.

mem\_hash: chunknode \*,

 $octa = struct, \S 17.$ 

§207.

**217.** Cache transfers. We have seen that the *Dcache¬flusher* sends data directly to the main memory if there is no S-cache. But if both D-cache and S-cache exist, the *Dcache¬flusher* is a more complicated coroutine of type *flush\_to\_S*. In this case we need to deal with the fact that the S-cache blocks might be larger than the D-cache blocks; furthermore, the S-cache might have a write-around and/or write-through policy, etc. But one simplifying fact does help us: We know that the flusher coroutine will not be aborted until it has run to completion.

Some machines, such as the Alpha 21164, have an additional cache between the S-cache and memory, called the B-cache (the "backup cache"). A B-cache could be simulated by extending the logic used here; but such extensions of the present program are left to the interested reader.

```
\langle Cases for control of special coroutines 126\rangle + \equiv
case flush\_to\_S:
   { register cache *c = (cache *) data \rightarrow ptr_a;
      register int block\_diff = Scache \neg bb - c \neg bb;
      p = (\mathbf{cacheblock} *) \ data \neg ptr\_b;
      switch (data→state) {
      case 0: if (Scache \rightarrow lock) wait(1);
         data \rightarrow state = 1:
      case 1: set\_lock(self, Scache \neg lock);
         data \neg ptr\_b = (\mathbf{void} *) cache\_search(Scache, c \neg outbuf.tag);
         if (data \neg ptr b) data \neg state = 4;
         else if (Scache→mode & WRITE_ALLOC) data¬state = (block_diff ? 2 : 3);
         else data \rightarrow state = 6;
         wait(Scache \rightarrow access\_time);
      case 2: ⟨Fill Scache→inbuf with clean memory data 219⟩;
      case 3: \langle Allocate a slot p in the S-cache 218\rangle;
         if (block\_diff) \langle Copy Scache \neg inbuf to slot p 220 \rangle;
      case 4: copy\_block(c, \&(c \rightarrow outbuf), Scache, p);
         hit\_set = cache\_addr(Scache, c \rightarrow outbuf.tag); use\_and\_fix(Scache, p);
            /* p \text{ not moved } */
         data \rightarrow state = 5; wait(Scache \rightarrow copy\_in\_time);
      case 5: if ((Scache \neg mode \& WRITE\_BACK) \equiv 0) {
                                                                       /* write-through */
            if (Scache \neg flusher.next) wait (1);
            flush\_cache(Scache, p, true);
         goto terminate;
      case 6: (Handle write-around when flushing to the S-cache 221);
   }
         \langle Allocate a slot p in the S-cache 218\rangle \equiv
   if (Scache \rightarrow filler.next) wait (1);
                                                /* perhaps an unnecessary precaution? */
   p = alloc\_slot(Scache, c \rightarrow outbuf.tag);
   if (\neg p) wait (1);
   data \neg ptr\_b = (\mathbf{void} *) p;
   p \rightarrow tag = c \rightarrow outbuf.tag; p \rightarrow tag.l = c \rightarrow outbuf.tag.l & (-Scache \rightarrow bb);
This code is used in section 217.
```

**219.** We only need to read *block\_diff* bytes, but it's easier to read them all and to charge only for reading the ones we needed.

```
\langle \text{ Fill } Scache \neg inbuf \text{ with clean memory data 219} \rangle \equiv
   { register int count = block\_diff \gg 3;
      register int off, delay;
      octa addr:
      if (mem\_lock) wait(1);
      addr.h = c \rightarrow outbuf.tag.h; addr.l = c \rightarrow outbuf.tag.l \& -Scache \rightarrow bb;
      off = (addr.l \& #ffff) \gg 3;
      for (j = 0; j < Scache \rightarrow bb \gg 3; j++)
         if (i \equiv 0) Scache-inbuf.data[i] = mem\_read(addr);
         else Scache \rightarrow inbuf.data[j] = mem\_hash[last\_h].chunk[j + off];
      set_lock(&mem_locker, mem_lock);
      delay = mem\_addr\_time + (int)((count + bus\_words - 1)/(bus\_words)) * mem\_read\_time;
      startup(&mem_locker, delay);
      data \rightarrow state = 3; wait(delay);
This code is used in section 217.
220.
        \langle \text{Copy } Scache \rightarrow inbuf \text{ to slot } p \text{ 220} \rangle \equiv
      register octa *d = p \rightarrow data:
      p \rightarrow data = Scache \rightarrow inbuf.data; Scache \rightarrow inbuf.data = d;
This code is used in section 217.
```

```
access_time: int, §167.
alloc_slot: static cacheblock
  *(), §205.
bb: int, §167.
bus\_words: int, §214.
cache = struct, \S 167.
cache\_addr = macro(), §192.
cache_search: static
  cacheblock *(), §193.
cacheblock = struct, §167.
chunk: octa *, §206.
copy_block: static void (),
  §185.
copy\_in\_time: int, §167.
data: register control *,
  §124.
data: octa *, §167.
Dcache: cache *, §168.
filler: coroutine, §167.
flush_cache: static void (),
```

§203.  $flush\_to\_S = 96, \S 129.$ flusher: coroutine, §167. h: **tetra**, §17. hit\_set: cacheset, §194. inbuf: cacheblock, §167. j: register int, §12. l: tetra, §17.  $last_h$ : int, §211. lock: lockvar, §167.  $mem\_addr\_time: int, §214.$ mem\_hash: chunknode \*, §207. mem\_lock: lockvar, §214. mem\_locker: coroutine, §127. mem\_read: octa (), §210.  $mem\_read\_time: int, \S 214.$ mode: **int**, §167. next: coroutine \*, §23.  $octa = struct, \S 17.$ 

outbuf: cacheblock, §167. p: register cacheblock \*, §258.  $ptr_a$ : void \*, §44.  $ptr_b$ : void \*, §44. Scache: cache \*, §168. self: register coroutine \*, §124.  $set\_lock = macro(), §37.$ startup: static void (), §31. *state*: **int**, §44. tag: octa, §167. terminate: label, §125.  $true = 1, \S 11.$ use\_and\_fix: static cacheblock \*(), §196.  $wait = macro(), \S 125.$  $WRITE\_ALLOC = 2, §166.$ WRITE\_BACK = 1,  $\S 166$ .

**221.** Here we assume that the granularity is 8.

```
 \langle \text{ Handle write-around when flushing to the S-cache } 221 \rangle \equiv \mathbf{if} \ (Scache \neg flusher.next) \ wait(1); \\ Scache \neg outbuf.tag.h = c \neg outbuf.tag.h; \\ Scache \neg outbuf.tag.l = c \neg outbuf.tag.l & (-Scache \neg bb); \\ \mathbf{for} \ (j=0; \ j < Scache \neg bb) \gg Scache \neg g; \ j++) \ Scache \neg outbuf.dirty[j] = false; \\ copy\_block(c, & (c \neg outbuf), Scache, & (Scache \neg outbuf)); \\ startup(& Scache \neg flusher, Scache \neg copy\_out\_time); \\ \mathbf{goto} \ terminate; \\ \end{cases}
```

This code is used in section 217.

222. The S-cache gets new data from memory by invoking a fill\_from\_mem coroutine; the I-cache or D-cache may also invoke a fill\_from\_mem coroutine, if there is no S-cache. When such a coroutine is invoked, it holds mem\_lock, and its caller has gone to sleep. A physical memory address is given in data¬z.o, and data¬ptr\_a specifies either Icache, Dcache, or Scache. Furthermore, data¬ptr\_b specifies a block within that cache, determined by the alloc\_slot routine. The coroutine simulates reading the contents of the specified memory location, places the result in the x.o field of its caller's control block, and wakes up the caller. It proceeds to fill the cache's inbuf and, ultimately, the specified cache block, before waking the caller again.

Let  $c = data \neg ptr\_a$ . The caller is then  $c \neg fill\_lock$ , if this variable is nonnull. However, the caller might not wish to be awoken or to receive the data (for example, if it has been aborted). In such cases  $c \neg fill\_lock$  will be  $\Lambda$ ; the filling action continues without the wakeup calls. If c = Scache, the S-cache will be locked and the caller will not have been aborted.

```
\langle Cases for control of special coroutines 126\rangle + \equiv
case fill_from_mem:
   { register cache *c = (cache *) data \neg ptr\_a;
      register coroutine *cc = c \rightarrow fill\_lock;
      switch (data¬state) {
      case 0: data \rightarrow x.o = mem\_read(data \rightarrow z.o);
          if (cc) {
             cc \rightarrow ctl \rightarrow x.o = data \rightarrow x.o; awaken(cc, mem\_read\_time);
          data \rightarrow state = 1;
          \langle \text{Read data into } c \rightarrow inbuf \text{ and wait for the bus } 223 \rangle;
      case 1: release\_lock(self, mem\_lock); data \neg state = 2;
      case 2: if (c \neq Scache) {
             if (c \rightarrow lock) wait (1);
             set\_lock(self, c \rightarrow lock);
          if (cc) awaken (cc, c \rightarrow copy\_in\_time);
                                                                 /* the second wakeup call */
          load\_cache(c, (\mathbf{cacheblock} *) data \neg ptr\_b);
          data \rightarrow state = 3; wait(c \rightarrow copy\_in\_time);
      case 3: goto terminate;
   }
```

**223.** If *c*'s cache size is no larger than the memory bus, we wait an extra cycle, so that there will be two wakeup calls.

```
 \left \{ \begin{array}{l} \text{Read data into $c$-$inbuf} \text{ and wait for the bus 223} \right \rangle \equiv \\ \left \{ \begin{array}{l} \textbf{register int } count, \ off; \\ c \neg inbuf. tag = data \neg z.o; \ c \neg inbuf. tag.l \ \& = -c \neg bb; \\ count = c \neg bb \gg 3, \ off = (c \neg inbuf. tag.l \ \& \ ^\# \texttt{fffff}) \gg 3; \\ \textbf{for } (i = 0; \ i < count; \ i++, off ++) \ c \neg inbuf. data[i] = mem\_hash[last\_h].chunk[off]; \\ \textbf{if } (count \leq bus\_words) \ wait(1 + mem\_read\_time) \\ \textbf{else } wait((\textbf{int})(count/bus\_words) * mem\_read\_time); \\ \end{array} \right \}
```

This code is used in section 222.

```
alloc_slot: static cacheblock
  *(), §205.
awaken = macro(), \S 125.
bb: int, §167.
bus_words: int, §214.
c: register cache *, §217.
cache = struct, \S 167.
cacheblock = struct, §167.
chunk: octa *, §206.
copy_block: static void (),
copy\_in\_time: int, §167.
copy\_out\_time: int, \S 167.
coroutine = struct, §23.
ctl: control *, §23.
data: register control *,
  §124.
data: octa *, §167.
Dcache:  cache *, §168.
```

```
dirty: char *, §167.
false = 0, \S 11.
fill\_from\_mem = 95, §129.
fill_lock: lockvar, §167.
flusher: coroutine, §167.
g: int, §167.
h: tetra, §17.
i: register int, §12.
Icache: cache *, §168.
inbuf: cacheblock, §167.
j: register int, §12.
l: tetra, §17.
last_h: int, §211.
load_cache: static void (),
  §201.
lock: lockvar, §167.
mem_hash: chunknode *,
  §207.
mem_lock: lockvar, §214.
```

```
mem_read: octa (), §210.
mem\_read\_time: int, \S 214.
next: coroutine *, §23.
o: octa, §40.
outbuf: cacheblock, §167.
ptr_a: void *, §44.
ptr_b: void *, §44.
release\_lock = macro(), §37.
Scache: cache *, §168.
self: register coroutine *,
  §124.
set\_lock = macro(), \S 37.
startup: static void (), §31.
state: int, \S 44.
tag: octa, §167.
terminate: label, §125.
wait = macro(), \S 125.
x: specnode, §44.
z: spec, §44.
```

**224.** The *fill\_from\_S* coroutine has the same conventions as *fill\_from\_mem*, except that the data comes directly from the S-cache if it is present there. This is the *filler* coroutine for the I-cache and D-cache if an S-cache is present.

```
\langle Cases for control of special coroutines 126\rangle + \equiv
case fill_from_S:
   { register cache *c = (cache *) data \neg ptr\_a;}
      register coroutine *cc = c \rightarrow fill\_lock;
      p = (\mathbf{cacheblock} *) \ data \rightarrow ptr \mathbf{c};
      switch (data→state) {
      case 0: p = cache\_search(Scache, data \neg z.o);
         if (p) goto S_non_miss;
         data \rightarrow state = 1;
      case 1: (Start the S-cache filler 225);
         data \rightarrow state = 2; sleep;
      case 2: if (cc) {
                                              /* this data has been supplied by Scache→filler */
            cc \rightarrow ctl \rightarrow x.o = data \rightarrow x.o;
                                                         /* we propagate it back */
            awaken(cc, Scache \neg access\_time);
         data \rightarrow state = 3; sleep;
                                            /* when we awake, the S-cache will have our data */
      S\_non\_miss: if (cc) {
            cc \rightarrow ctl \rightarrow x.o = p \rightarrow data[(data \rightarrow z.o.l \& (Scache \rightarrow bb - 1)) \gg 3];
            awaken(cc, Scache \neg access\_time);
      case 3: \langle \text{Copy data from } p \text{ into } c \rightarrow inbuf 226 \rangle;
         data \rightarrow state = 4; wait(Scache \rightarrow access\_time);
      case 4: Scache \rightarrow lock = \Lambda; /* we had been holding that lock */
         data \rightarrow state = 5;
      case 5: if (c \neg lock) wait(1);
         set\_lock(self, c \rightarrow lock);
         load\_cache(c, (cacheblock *) data \neg ptr\_b);
         data \rightarrow state = 6; wait(c \rightarrow copy\_in\_time);
      case 6: if (cc) awaken(cc, 1); /* second wakeup call */
         goto terminate;
      }
   }
```

**225.** We are already holding the *Scache¬lock*, but we're about to take on the *Scache¬fill\_lock* too (with the understanding that one is "stronger" than the other). For a short time the *Scache¬lock* will point to us but we will point to *Scache¬fill\_lock*; this will not cause difficulty, because the present coroutine is not abortable.

```
 \langle \text{Start the S-cache filler } 225 \rangle \equiv \\ \text{if } (Scache \neg filler.next \lor mem\_lock) \ wait(1); \\ p = alloc\_slot(Scache, data \neg z.o); \\ \text{if } (\neg p) \ wait(1); \\ set\_lock(\&Scache \neg filler, mem\_lock); \\ set\_lock(self, Scache \neg fill\_lock); \\ data \neg ptr\_c = Scache \neg fill\_ctl.ptr\_b = (\textbf{void} *) \ p; \\ Scache \neg filler\_ctl.z.o = data \neg z.o; \\ startup(\&Scache \neg filler, mem\_addr\_time); \\ \end{cases}
```

This code is used in section 224.

**226.** The S-cache blocks might be wider than the blocks of the I-cache or D-cache, so the copying in this step isn't quite trivial.

```
 \begin{split} &\langle \operatorname{Copy} \ \operatorname{data} \ \operatorname{from} \ p \ \operatorname{into} \ c \neg inbuf \ 226 \,\rangle \equiv \\ &\{ \ \operatorname{\mathbf{register} \ int} \ off; \\ & c \neg inbuf . tag = data \neg z.o; \ c \neg inbuf . tag.l \ \& = -c \neg bb; \\ & \mathbf{for} \ (j = 0, off = (c \neg inbuf . tag.l \ \& \ (Scache \neg bb - 1)) \gg 3; \ j < c \neg bb \gg 3; \ j + +, off + +) \\ & c \neg inbuf . data[j] = p \neg data[off]; \\ & release\_lock(self, Scache \neg fill\_lock); \\ & set\_lock(self, Scache \neg lock); \\ & \} \end{split}
```

This code is used in section 224.

```
access_time: int, §167.
alloc_slot: static cacheblock
  *(), §205.
awaken = macro(), \S 125.
bb: int, §167.
cache = struct, \S 167.
cache_search: static
  cacheblock *(), §193.
cacheblock = struct, \S 167.
copy\_in\_time: int, §167.
coroutine = struct, §23.
ctl: control *, §23.
data: octa *, §167.
data: register control *,
  §124.
fill\_from\_mem = 95, §129.
```

```
fill\_from\_S = 94, §129.
fill_lock: lockvar, §167.
filler: coroutine, §167.
filler_ctl: control, §167.
inbuf: cacheblock, §167.
j: register int, §12.
l: tetra, §17.
load_cache: static void (),
  §201.
lock: lockvar, §167.
mem\_addr\_time: int, \S 214.
mem_lock: lockvar, §214.
next: coroutine *, §23.
o: octa, §40.
p: register cacheblock *,
  §258.
```

```
ptr_a: void *, §44.
ptr_b: void *, §44.
ptr_c: void *, §44.
release\_lock = macro(), §37.
Scache: cache *, §168.
self: register coroutine *,
  §124.
set\_lock = macro(), \S 37.
sleep = macro, \S 125.
startup: static void (), §31.
state: int, \S 44.
tag: octa, §167.
terminate: label, §125.
wait = macro(), \S 125.
x: specnode, §44.
z: spec, \S44.
```

**227.** The instruction PRELD X,\$Y,\$Z generates  $\lfloor X/2^b \rfloor$  commands if there are  $2^b$  bytes per block in the D-cache. These commands will try to preload blocks Y + Z,  $Y + Z + Z^b$ , ..., into the cache if it is not too busy.

Similar considerations apply to the instructions PREGO X,\$Y,\$Z and PREST X,\$Y,\$Z.

```
⟨ Special cases of instruction dispatch 117⟩ +≡
case preld: case prest: if (¬Dcache) goto noop_inst;
if (cool¬xx ≥ Dcache¬bb) cool¬interim = true;
cool¬ptr_a = (void *) mem.up; break;
case prego: if (¬Icache) goto noop_inst;
if (cool¬xx ≥ Icache¬bb) cool¬interim = true;
cool¬ptr_a = (void *) mem.up; break;
```

**228.** If the block size is 64, a command like PREST 200,\$Y,\$Z is actually issued as four commands PREST 200,\$Y,\$Z; PREST 191,\$Y,\$Z; PREST 127,\$Y,\$Z; PREST 63,\$Y,\$Z. An interruption will then be able to resume properly. In the pipeline, the instruction PREST 200,\$Y,\$Z is considered to affect bytes \$Y + \$Z + 192\$ through \$Y + \$Z + 200, or fewer bytes if \$Y + \$Z is not a multiple of 64. (Remember that these instructions are only hints; we act on them only if it is reasonably convenient to do so.)

```
⟨ Get ready for the next step of PRELD or PREST 228⟩ ≡ head \neg inst = (head \neg inst \& \sim ((Deache \neg bb - 1) \ll 16)) - \#10000; This code is used in section 81.
```

```
229. \langle Get ready for the next step of PREGO 229\rangle \equiv head \rightarrow inst = (head \rightarrow inst \& \sim ((Icache \rightarrow bb - 1) \ll 16)) - #10000;
```

This code is used in section 81.

**230.** Another coroutine, called *cleanup*, is occasionally called into action to remove dirty data from the D-cache and S-cache. If it is invoked by starting in state 0, with its i field set to sync, it will clean everything. It can also be invoked in state 4, with its i field set to syncd and with a physical address in its z.o field; then it simply makes sure that no D-cache or S-cache blocks associated with that address are dirty.

Field x.o.h should be set to zero if items are expected to remain in the cache after being cleaned; otherwise field x.o.h should be set to  $sign\_bit$ .

The coroutine that invokes *cleanup* should hold *clean\_lock*. If that coroutine dies, because of an interruption, the *cleanup* coroutine will terminate prematurely.

We assume that the D-cache and S-cache have some sort of way to identify their first dirty block, if any, in *access\_time* cycles.

```
⟨Global variables 20⟩ +≡
coroutine clean_co;
control clean_ctl;
lockvar clean_lock;

231. ⟨Initialize everything 22⟩ +≡
clean_co.ctl = & clean_ctl;
clean_co.name = "Clean";
clean_co.stage = cleanup;
clean_ctl.qo.o.l = 4;
```

```
232. ⟨Cases for control of special coroutines 126⟩ +≡ case cleanup: p = (cacheblock *) data¬ptr_b; switch (data¬state) {
  ⟨Cases 0 through 4, for the D-cache 233⟩;
  ⟨Cases 5 through 9, for the S-cache 234⟩; case 10: goto terminate;
}
```

```
access\_time: \textbf{int}, \S 167.\\ bb: \textbf{int}, \S 167.\\ \textbf{cacheblock} = \textbf{struct}, \S 167.\\ cleanup = 91, \S 129.\\ \textbf{control} = \textbf{struct}, \S 44.\\ cool: \textbf{control} *, \S 60.\\ \textbf{coroutine} = \textbf{struct}, \S 23.\\ ctl: \textbf{control} *, \S 23.\\ data: \textbf{register control} *, \S 124.\\ Dcache: \textbf{cache} *, \S 168.\\ go: \textbf{specnode}, \S 44.\\ h: \textbf{tetra}, \S 17.\\ head: \textbf{fetch} *, \S 69.\\ \end{cases}
```

```
i: register int, §12.

Icache: cache *, §168.

inst: tetra, §68.

interim: bool, §44.

l: tetra, §17.

lockvar = coroutine *, §37.

mem: specnode, §115.

name: char *, §23.

noop_inst: label, §118.

o: octa, §40.

p: register cacheblock *, §258.

prego = 73, §49.

preld = 61, §49.
```

```
\begin{array}{l} prest = 62, \, \S 49. \\ ptr\_a: \, \mathbf{void} \, *, \, \S 44. \\ ptr\_b: \, \mathbf{void} \, *, \, \S 44. \\ sign\_bit = \mathrm{macro}, \, \S 80. \\ stage: \, \mathbf{int}, \, \S 23. \\ state: \, \mathbf{int}, \, \S 24. \\ sync = 79, \, \S 49. \\ syncd = 64, \, \S 49. \\ terminate: \, label, \, \S 125. \\ true = 1, \, \S 11. \\ up: \, \mathbf{specnode} \, *, \, \S 40. \\ x: \, \mathbf{specnode}, \, \S 44. \\ xx: \, \mathbf{unsigned} \, \, \mathbf{char}, \, \S 44. \\ z: \, \mathbf{spec}, \, \S 44. \\ z: \, \mathbf{spec}, \, \S 44. \\ \end{array}
```

```
233.
          \langle \text{Cases 0 through 4, for the D-cache 233} \rangle \equiv
case 0: if (Dcache \neg lock \lor (j = qet\_reader(Dcache) < 0)) wait(1);
   startup(\&Dcache \neg reader[j], Dcache \neg access\_time);
   set\_lock(self, Dcache \neg lock);
   i = i = 0:
Dclean\_loop: p = (i < Dcache \neg cc ? \&(Dcache \neg set[i][j]) : \&(Dcache \neg victim[j]));
   if (p \rightarrow taq.h \& siqn\_bit) goto Dclean\_inc;
   if (\neg is\_dirty(Dcache, p)) {
      p \rightarrow tag.h \mid = data \rightarrow x.o.h; goto Dclean\_inc;
   }
   data \rightarrow y.o.h = i, data \rightarrow y.o.l = j;
Dclean: data \rightarrow state = 1; data \rightarrow ptr\_b = (void *) p; wait(Dcache \rightarrow access\_time);
case 1: if (Dcache \neg flusher.next) wait(1);
   flush\_cache(Dcache, p, data \rightarrow x.o.h \equiv 0);
   p \rightarrow tag.h \mid = data \rightarrow x.o.h;
   release\_lock(self, Dcache \rightarrow lock);
   data \rightarrow state = 2; wait(Dcache \rightarrow copy\_out\_time);
                                                     /* premature termination */
case 2: if (\neg clean\_lock) goto done;
   if (Dcache \neg flusher.next) wait (1);
   if (data \rightarrow i \neq sync) goto Sprep;
   data \rightarrow state = 3:
case 3: if (Dcache \neg lock \lor (j = get\_reader(Dcache) < 0)) wait(1);
   startup(\&Dcache \neg reader[j], Dcache \neg access\_time);
   set\_lock(self, Dcache \neg lock);
   i = data \rightarrow u.o.h, i = data \rightarrow u.o.l:
Dclean\_inc: j++;
   if (i < Dcache \neg cc \land j \equiv Dcache \neg aa) j = 0, i++;
   if (i \equiv Dcache \neg cc \land j \equiv Dcache \neg vv) {
      data \neg state = 5; wait(Dcache \neg access\_time);
   goto Dclean_loop;
case 4: if (Dcache \neg lock \lor (j = get\_reader(Dcache) < 0)) wait(1);
   startup(\&Dcache \neg reader[j], Dcache \neg access\_time);
   set\_lock(self, Dcache \rightarrow lock);
   p = cache\_search(Dcache, data \neg z.o);
   if (p) {
      demote\_and\_fix(Dcache, p);
      if (is_dirty(Dcache, p)) goto Dclean;
   data \rightarrow state = 9; wait(Dcache \rightarrow access\_time);
This code is used in section 232.
          \langle \text{Cases 5 through 9, for the S-cache 234} \rangle \equiv
case 5: if (self \neg lockloc) *(self \neg lockloc) = \Lambda, self \neg lockloc = \Lambda;
   if (\neg Scache) goto done:
   if (Scache \rightarrow lock) wait (1);
   set\_lock(self, Scache \neg lock);
   i = j = 0;
Sclean\_loop: p = (i < Scache \neg cc ? \&(Scache \neg set[i][j]) : \&(Scache \neg victim[j]));
```

flusher: coroutine, §167.

```
if (p \rightarrow taq.h \& siqn\_bit) goto Sclean\_inc;
   if (\neg is\_dirty(Scache, p)) {
      p \rightarrow taq.h \mid = data \rightarrow x.o.h; goto Sclean\_inc;
   data \rightarrow y.o.h = i, data \rightarrow y.o.l = i;
Sclean: data \rightarrow state = 6; data \rightarrow ptr_b = (void *) p; wait(Scache \rightarrow access\_time);
case 6: if (Scache \rightarrow flusher.next) wait(1);
   flush\_cache(Scache, p, data \rightarrow x.o.h \equiv 0);
   p \rightarrow tag.h \mid = data \rightarrow x.o.h;
   release\_lock(self, Scache \rightarrow lock);
   data \neg state = 7; wait(Scache \neg copy\_out\_time);
case 7: if (\neg clean\_lock) goto done:
                                                   /* premature termination */
   if (Scache \neg flusher.next) wait (1);
   if (data \rightarrow i \neq sync) goto done;
   data \rightarrow state = 8;
case 8: if (Scache \neg lock) wait(1);
   set\_lock(self, Scache \neg lock);
   i = data \rightarrow y.o.h, j = data \rightarrow y.o.l;
Sclean\_inc: j++;
   if (i < Scache \neg cc \land j \equiv Scache \neg aa) j = 0, i++;
   if (i \equiv Scache \neg cc \land j \equiv Scache \neg vv) {
      data \rightarrow state = 10; wait(Scache \rightarrow access\_time);
   }
   goto Sclean_loop;
Sprep: data \rightarrow state = 9;
case 9: if (self¬lockloc) release_lock(self, Dcache¬lock);
   if (\neg Scache) goto done;
   if (Scache \rightarrow lock) wait(1);
   set\_lock(self, Scache \neg lock);
   p = cache\_search(Scache, data \neg z.o);
   if (p) {
      demote\_and\_fix(Scache, p);
      if (is_dirty(Scache, p)) goto Sclean;
   data \neg state = 10; \ wait(Scache \neg access\_time);
This code is used in section 232.
aa: int, §167.
                                       get_reader: static int (), §183.
                                                                              Scache: cache *, §168.
access_time: int, §167.
                                       h: tetra, §17.
                                                                              self: register coroutine *,
cache_search: static
                                       i: internal_opcode, §44.
                                                                                §124.
  cacheblock *(), §193.
                                       i: register int, §12.
                                                                              set: \mathbf{cacheset} *, \S 167.
cc: int, §167.
                                       is_dirty: static bool (), §170.
                                                                              set\_lock = macro(), §37.
clean_lock: lockvar, §230.
                                       j: register int, §12.
                                                                              sign\_bit = macro, \S 80.
copy_out_time: int, §167.
                                       l: tetra, §17.
                                                                              startup: static void (), §31.
data: register control *,
                                       lock: lockvar, §167.
                                                                              state: int, §44.
  §124.
                                       lockloc: coroutine **, §23.
                                                                              sync = 79, \S 49.
Dcache: cache *, §168.
                                       next: coroutine *, §23.
                                                                              tag: octa, §167.
demote_and_fix: static
                                       o: octa, §40.
                                                                              victim: cacheset, §167.
  cacheblock *(), §199.
                                       p: register cacheblock *,
                                                                              vv: int, \S 167.
done: label, §125.
                                         §258.
                                                                              wait = macro(), \S 125.
                                                                              x: specnode, §44.
flush_cache: static void (),
                                       ptr_b: void *, §44.
                                       reader: coroutine *, §167.
  §203.
                                                                              y: spec, §44.
```

 $release\_lock = macro(), §37.$ 

z: spec,  $\S44$ .

235. Virtual address translation. Special arrays of coroutines and control blocks come into play when we need to implement MMIX's rather complicated page table mechanism for virtual address translation. In effect, we have up to ten control blocks *outside* of the reorder buffer that are capable of executing instructions just as if they were part of that buffer. The "opcodes" of these non-abortable instructions are special internal operations called *ldptp* and *ldpte*, for loading page table pointers and page table entries.

Suppose, for example, that we need to translate a virtual address for the DT-cache in which the virtual page address  $(a_4a_3a_2a_1a_0)_{1024}$  of segment i has  $a_4=a_3=0$  and  $a_2\neq 0$ . Then the rules say that we should first find a page table pointer  $p_2$  in physical location  $2^{13}(r+b_i+2)+8a_2$ , then another page table pointer  $p_1$  in location  $p_2+8a_1$ , and finally the page table entry  $p_0$  in location  $p_1+8a_0$ . The simulator achieves this by setting up three coroutines  $c_0$ ,  $c_1$ ,  $c_2$  whose control blocks correspond to the pseudo-instructions

```
LDPTP x, [2^{63}+2^{13}(r+b_i+2)], 8a_2 LDPTP x, x, 8a_1 LDPTE x, x, 8a_0
```

where x is a hidden internal register and the other quantities are immediate values. Slight changes to the normal functionality of LDO give us the actions needed to implement LDPTP and LDPTE. Coroutine  $c_j$  corresponds to the instruction that involves  $a_j$  and computes  $p_j$ ; when  $c_0$  has computed its value  $p_0$ , we know how to translate the original virtual address.

The LDPTP and LDPTE commands return zero if their y operand is zero or if the page table does not properly match rV.

```
#define LDPTP PREGO /* internally this won't cause confusion */ #define LDPTE GO  \langle \mbox{ Global variables 20} \rangle + \equiv \\  \mbox{ control } \mbox{ $IPTctl[5]$, $ DPTctl[5]$; $ /* control blocks for I and D page translation */ coroutine $IPTco[10]$, $ DPTco[10]$; $ /* each coroutine is a two-stage pipeline */ char *$IPTname[5]$ = {"IPTO", "IPT1", "IPT2", "IPT3", "IPT4"}$; char *$DPTname[5]$ = {"DPT0", "DPT1", "DPT2", "DPT3", "DPT4"}$;
```

```
236. \langle Initialize everything 22\rangle + \equiv
  for (j = 0; j < 5; j++) {
    DPTco[2*j].ctl = \&DPTctl[j]; IPTco[2*j].ctl = \&IPTctl[j];
    if (i > 0) DPTctl[i].op = IPTctl[i].op = LDPTP, <math>DPTctl[i].i = IPTctl[i].i = ldptp;
    else DPTctl[0].op = IPTctl[0].op = LDPTE, DPTctl[0].i = IPTctl[0].i = ldpte;
    IPTctl[j].loc = DPTctl[j].loc = neq\_one;
    IPTctl[j].qo.o = DPTctl[j].qo.o = incr(neq\_one, 4);
    IPTctl[j].ptr_a = DPTctl[j].ptr_a = (void *) \&mem;
    IPTctl[j].ren\_x = DPTctl[j].ren\_x = true;
    IPTctl[j].x.addr.h = DPTctl[j].x.addr.h = -1;
    IPTco[2*j].stage = DPTco[2*j].stage = 1;
    IPTco[2*j+1].stage = DPTco[2*j+1].stage = 2;
    IPTco[2*i].name = IPTco[2*i+1].name = IPTname[i];
    DPTco[2*j].name = DPTco[2*j+1].name = DPTname[j];
  ITcache \rightarrow filler\_ctl.ptr\_c = (void *) \& IPTco[0];
  DTcache \neg filler\_ctl.ptr\_c = (void *) \& DPTco[0];
```

```
addr: octa, §40.
control = struct, \S 44.
coroutine = struct, \S 23.
ctl: control *, §23.
DTcache: cache *, §168.
filler_ctl: control, §167.
GO = {}^{\#}9e, \S 47.
go: \mathbf{specnode}, \S 44.
h: tetra, §17.
i: internal_opcode, §44.
```

```
incr: octa (), MMIX-ARITH §6.
ITcache: cache *, §168.
j: register int, §10.
ldpte = 58, \S 49.
ldptp = 57, \S 49.
loc: octa, §44.
mem: specnode, §115.
name: char *, §23.
neq_one: octa, MMIX-ARITH §4. x: specnode, §44.
```

o: octa, §40. op: mmix\_opcode, §44.  $PREGO = {}^{\#}9c, \S 47.$  $ptr_a$ : void \*, §44.  $ptr\_c$ : void \*, §44.  $ren_x$ : bool, §44.  $stage: int, \S 23.$  $true = 1, \S 11.$ 

**237.** Page table calculations are invoked by a coroutine of type fill\_from\_virt, which is used to fill the IT-cache or DT-cache. The calling conventions of fill\_from\_virt are analogous to those of fill\_from\_mem or fill\_from\_S: A virtual address is supplied in data¬y.o, and data¬ptr\_a points to a cache (ITcache or DTcache), while data¬ptr\_b is a block in that cache. We wake up the caller, who holds the cache's fill\_lock, as soon as the translation of the given address has been calculated, unless the caller has been aborted. (No second wakeup call is necessary.)

```
\langle Cases for control of special coroutines 126\rangle + \equiv
case fill_from_virt:
   { register cache *c = (cache *) data \rightarrow ptr_a;
      register coroutine *cc = c \rightarrow fill\_lock;
      register coroutine *co = (coroutine *) data \neg ptr\_c;
          /* \&IPTco[0] \text{ or } \&DPTco[0] */
      octa aaaaa;
      switch (data→state) {
      case 0: (Start up auxiliary coroutines to compute the page table entry 243);
          data \rightarrow state = 1;
      case 1: if (data \rightarrow b.p) {
             if (data \rightarrow b.p \rightarrow known) data \rightarrow b.o = data \rightarrow b.p \rightarrow o, data \rightarrow b.p = \Lambda;
             else wait(1):
          \langle \text{ Compute the new entry for } c \rightarrow inbuf \text{ and give the caller a sneak preview 245} \rangle;
          data \rightarrow state = 2;
      case 2: if (c \rightarrow lock) wait(1):
          set\_lock(self, c \rightarrow lock);
          load\_cache(c, (cacheblock *) data \rightarrow ptr\_b);
          data \neg state = 3; wait(c \neg copy\_in\_time);
      case 3: data \rightarrow b.o = zero\_octa; goto terminate;
   }
```

**238.** The current contents of rV, the special virtual translation register, are kept unpacked in several global variables *page\_r*, *page\_s*, etc., for convenience. Whenever rV changes, we recompute all these variables.

```
\langle \text{Global variables } 20 \rangle + \equiv
                    /* the 10-bit n field of rV, times 8 */
  int page_n:
                    /* the 27-bit r field of rV */
  int page_r;
  int page_s;
                    /* the 8-bit s field of rV */
                    /* the 3-bit f field of rV */
  int page_f;
                       /* the 4-bit b fields of rV; page_b[0] = 0 */
  int page_b[5];
                           /* the least significant s bits */
  octa page_mask;
                                 /* does rV violate the rules? */
  bool page\_bad = true;
239. \langle Update the page variables 239\rangle \equiv
  \{ \text{ octa } rv; \}
     rv = data \rightarrow z.o:
     page_f = rv.l \& 7, page_bad = (page_f > 1);
     page_n = rv.l \& #1ff8;
```

```
 \begin{array}{l} rv = shift\_right(rv,13,1); \\ page\_r = rv.l \ \& \ ^{*}7fffffff; \\ rv = shift\_right(rv,27,1); \\ page\_s = rv.l \ \& \ ^{*}ff; \\ \text{if } (page\_s < 13 \lor page\_s > 48) \ page\_bad = true; \\ \text{else if } (page\_s < 32) \ page\_mask.h = 0, page\_mask.l = (1 \ll page\_s) - 1; \\ \text{else } page\_mask.h = (1 \ll (page\_s - 32)) - 1, page\_mask.l = \ ^{*}fffffffff; \\ page\_b[4] = (rv.l \gg 8) \ \& \ ^{\#}f; \\ page\_b[3] = (rv.l \gg 12) \ \& \ ^{\#}f; \\ page\_b[2] = (rv.l \gg 16) \ \& \ ^{\#}f; \\ page\_b[1] = (rv.l \gg 20) \ \& \ ^{\#}f; \\ \end{array}
```

This code is used in section 329.

**240.** Here's how we compute a tag of the IT-cache or DT-cache from a virtual address, and how we compute a physical address from a translation found in the cache.

```
#define trans_key(addr) incr(oandn(addr, page_mask), page_n)

⟨Internal prototypes 13⟩ +=
    static octa phys_addr ARGS((octa, octa));

241. ⟨Subroutines 14⟩ +=
    static octa phys_addr(virt, trans)
        octa virt, trans;
    { octa t;
        t = oandn(trans, page_mask); /* zero out the ynp fields of a PTE */
        return oplus(t, oand(virt, page_mask));
    }
```

**242.** Cheap (and slow) versions of MMIX leave the page table calculations to software. If the global variable *no\_hardware\_PT* is set true, *fill\_from\_virt* begins its actions in state 1, not state 0. (See the RESUME\_TRANS operation.)

 $\langle \text{External variables 4} \rangle + \equiv$ 

Extern bool no\_hardware\_PT;

```
inbuf: cacheblock, §167.
ARGS = macro, \S 6.
                                                                     ptr_b: void *, §44.
b: spec, §44.
                                   incr: octa (), MMIX-ARITH §6.
                                                                     ptr_c: void *, §44.
b, mmix-doc \S45.
                                   IPTco: coroutine [], §235.
                                                                     r, mmix-doc §45.
bool = enum, \S 11.
                                   ITcache: \mathbf{cache} *, \S 168.
                                                                     RESUME_TRANS = 3, §320.
cache = struct, \S 167.
                                   known: bool, \S 40.
                                                                     s, MMIX-DOC §45.
cacheblock = struct, \S 167.
                                  l: tetra, §17.
                                                                     self: register coroutine *,
copy_in_time: int, §167.
                                  load_cache: static void (),
coroutine = struct, \S 23.
                                     §201.
                                                                     set\_lock = macro(), §37.
                                  lock: lockvar, §167.
data: register control *,
                                                                     shift_right: octa (),
  ξ124.
                                  n, MMIX-DOC §45.
                                                                       MMIX-ARITH §7.
DPTco: coroutine [], §235.
                                  o: octa, §40.
                                                                     state: int, §44.
DTcache: cache *, §168.
                                   oand: octa (),
                                                                     terminate: label, §125.
Extern = macro, \S 4.
                                    MMIX-ARITH §25.
                                                                     true = 1, \S 11.
f: register int, §75.
                                  oandn: octa (),
                                                                     wait = macro(), \S 125.
fill\_from\_mem = 95, §129.
                                    MMIX-ARITH §25.
                                                                     y: spec, §44.
fill\_from\_S = 94, \S 129.
                                  octa = struct, §17.
                                                                     z: spec, \S44.
fill\_from\_virt = 93, \S 129.
                                  oplus: octa (), MMIX-ARITH §5. zero_octa: octa,
fill_lock: lockvar, §167.
                                  p: specnode *, §40.
                                                                       MMIX-ARITH §4.
h: tetra, §17.
                                  ptr_a: void *, §44.
```

243. Note: The operating system is supposed to ensure that changes to the page table entries do not appear in the pipeline when a translation cache is being updated. The internal LDPTP and LDPTE instructions use only the "hot state" of the memory system.

```
\langle Start up auxiliary coroutines to compute the page table entry 243\rangle \equiv
   aaaaa = data \rightarrow u.o:
  i = aaaaa.h \gg 29;
                           /* the segment number */
   aaaaa.h \&= #1fffffff;
                                   /* the address within segment i */
   aaaaa = shift\_right(aaaaa, page\_s, 1);
                                                  /* the page address */
   for (j = 0; aaaaa.l \neq 0 \lor aaaaa.h \neq 0; j++) {
     co[2*j].ctl \rightarrow z.o.h = 0, co[2*j].ctl \rightarrow z.o.l = (aaaa.l \& #3ff) \ll 3;
     aaaaa = shift\_right(aaaaa, 10, 1);
  if (page\_b[i+1] < page\_b[i] + j) /* address too large */
           /* nothing needs to be done, since data→b.o is zero */
  \mathbf{else} \ \ \{
     if (j \equiv 0) j = 1, co[0].ctl \rightarrow z.o = zero\_octa;
     \langle Issue j pseudo-instructions to compute a page table entry 244\rangle;
```

This code is used in section 237.

**244.** The first stage of coroutine  $c_j$  is co[2\*j]. It will pass the jth control block to the second stage, co[2\*j+1], which will load page table information from memory (or hopefully from the D-cache).

```
\langle \text{Issue } j \text{ pseudo-instructions to compute a page table entry } 244 \rangle \equiv j^{--}; aaaa . l = page\_r + page\_b[i] + j; co[2*j].ctl \neg y.p = \Lambda; co[2*j].ctl \neg y.o = shift\_left(aaaaa, 13); co[2*j].ctl \neg y.o.h += sign\_bit; for (;; j^{--}) \{ co[2*j].ctl \neg x.o = zero\_octa; co[2*j].ctl \neg x.known = false; co[2*j].ctl \neg owner = \&co[2*j]; startup(\&co[2*j], 1); if (j \equiv 0) \text{ break}; co[2*(j-1)].ctl \neg y.p = \&co[2*j].ctl \neg x; \} data \neg b.p = \&co[0].ctl \neg x; This code is used in section 243.
```

**245.** At this point the translation of the given virtual address  $data \rightarrow y.o$  is the octabyte  $data \rightarrow b.o$ . Its least significant three bits are the protection code  $p = p_r p_w p_x$ ; its page address field is scaled by  $2^s$ . It is entirely zero, including the protection bits, if there was a page table failure.

The z field of the caller receives this translation.

```
\langle Compute the new entry for c \neg inbuf and give the caller a sneak preview 245 \rangle \equiv c \neg inbuf .tag = trans\_key(data \neg y.o); c \neg inbuf .data[0] = data \neg b.o; if (cc) { cc \neg ctl \neg z.o = data \neg b.o; awaken(cc, 1); }
```

This code is used in section 237.

```
\begin{array}{l} aaaaa: \ \mathbf{octa}, \ \S 237. \\ awaken = \mathsf{macro} \ (\ ), \ \S 125. \\ b: \ \mathbf{spec}, \ \S 44. \\ c: \ \mathbf{register} \ \mathbf{cache} \ *, \ \S 237. \\ cc: \ \mathbf{register} \ \mathbf{coroutine} \ *, \\ \S 237. \\ co: \ \mathbf{register} \ \mathbf{coroutine} \ *, \\ \S 237. \\ ctl: \ \mathbf{control} \ *, \ \S 23. \\ data: \ \mathbf{register} \ \mathbf{control} \ *, \\ \S 124. \\ data: \ \mathbf{octa} \ *, \ \S 167. \\ false = 0, \ \S 11. \end{array}
```

```
h: tetra, §17.
i: register int, §12.
inbuf: cacheblock, §167.
j: register int, §12.
known: bool, §40.
l: tetra, §17.
o: octa, §40.
owner: coroutine *, §44.
p: specnode *, §40.
page_b: int [], §238.
page_r: int, §238.
shift_left: octa (),
```

```
MMIX-ARITH §7.

shift_right: octa (),

MMIX-ARITH §7.

sign_bit = macro, §80.

startup: static void (), §31.

tag: octa, §167.

trans_key = macro (), §240.

x: specnode, §44.

y: spec, §44.

z: spec, §44.

zero_octa: octa,

MMIX-ARITH §4.
```

**246.** The write buffer. The dispatcher has arranged things so that speculative stores into memory are recorded in a doubly linked list leading upward from *mem*. When such instructions finally are committed, they enter the "write buffer," which holds octabytes that are ready to be written into designated physical memory addresses (or into the D-cache and/or S-cache). The "hot state" of the computation is reflected not only by the registers and caches but also by the instructions that are pending in the write buffer.

```
⟨ Type definitions 11⟩ +≡
typedef struct {
    octa o; /* data to be stored */
    octa addr; /* its physical address */
    tetra stamp; /* when last committed (mod 2<sup>32</sup>) */
    internal_opcode i; /* is this write special? */
    int size; /* parameter for spec_write */
} write_node;
```

**247.** We represent the buffer in the usual way as a circular list, with elements  $write\_tail + 1$ ,  $write\_tail + 2$ , ...,  $write\_head$ .

The data will sit at least *holding\_time* cycles before it leaves the write buffer. This speeds things up when different fields of the same octabyte are being stored by different instructions.

```
\langle \text{External variables } 4 \rangle + \equiv
  Extern write_node *wbuf_bot, *wbuf_top;
     /* least and greatest write buffer nodes */
  Extern write_node *write_head, *write_tail;
     /* front and rear of the write buffer */
                                      /* is the data in write_head being written? */
  Extern lockvar wbuf_lock;
                                    /* minimum holding time */
  Extern int holding_time;
  Extern lockvar speed_lock:
                                      /* should we ignore holding_time? */
        \langle Global variables 20\rangle + \equiv
  coroutine write_co;
                             /* coroutine that empties the write buffer */
  control write_ctl;
                           /* its control block */
        \langle Initialize everything 22 \rangle + \equiv
  write\_co.ctl = \&write\_ctl:
  write_co.name = "Write";
  write\_co.stage = write\_from\_wbuf;
  write\_ctl.ptr\_a = (void *) \& mem;
  write\_ctl.go.o.l = 4;
  startup(\&write\_co, 1);
  write\_head = write\_tail = wbuf\_top;
250. \langle Internal prototypes 13\rangle + \equiv
  static void print_write_buffer ARGS((void));
251. \langle Subroutines 14\rangle + \equiv
  static void print_write_buffer()
     printf("Write_buffer");
```

```
 \begin{array}{l} \textbf{if } (\textit{write\_head} \equiv \textit{write\_tail}) \ \textit{printf} (\texttt{"}_{\sqcup}(\texttt{empty})\texttt{"}); \\ \textbf{else } \{ \ \textbf{register write\_node} *p; \\ printf (\texttt{"}:\texttt{"}); \\ \textbf{for } (p = \textit{write\_head}; \ p \neq \textit{write\_tail}; \ p = (p \equiv \textit{wbuf\_bot} ? \textit{wbuf\_top} : p - 1)) \ \{ \\ printf (\texttt{"m}[\texttt{"}]; \ \textit{print\_octa}(p \rightarrow \textit{addr}); \ \textit{printf} (\texttt{"}] = \texttt{"}); \ \textit{print\_octa}(p \rightarrow \textit{o}); \\ \textbf{if } (p \rightarrow \textit{i} \equiv \textit{stunc}) \ \textit{printf} (\texttt{"}_{\sqcup} unc \texttt{"}); \\ \textbf{else if } (p \rightarrow \textit{i} \equiv \textit{sync}) \ \textit{printf} (\texttt{"}_{\sqcup} sync \texttt{"}); \\ printf (\texttt{"}_{\sqcup} (age_{\sqcup} \text{%d}) \texttt{"}, \textit{ticks}.l - p \rightarrow \textit{stamp}); \\ \} \\ \} \\ \} \\ \end{array}
```

**252.** The entire present state of the pipeline computation can be visualized by printing first the write buffer, then the reorder buffer, then the fetch buffer. This shows the progression of results from oldest to youngest, from sizzling hot to ice cold.

```
⟨ External prototypes 9⟩ +≡
Extern void print_pipe ARGS((void));

253. ⟨ External routines 10⟩ +≡
void print_pipe()
{
   print_write_buffer();
   print_reorder_buffer();
   print_fetch_buffer();
```

```
ARGS = macro, §6.
control = struct, §44.
coroutine = struct, §23.
ctl: control *, §23.
Extern = macro, §4.
go: specnode, §44.
internal_opcode = enum, §49.
l: tetra, §17.
lockvar = coroutine *, §37.
```

```
mem: specnode, §115.
name: char *, §23.
o: octa, §40.
octa = struct, §17.
print_fetch_buffer: static void
(), §73.
print_octa: static void (), §19.
print_reorder_buffer: static
void (), §63.
printf: int (), <stdio.h>.
```

```
\begin{array}{l} ptr\_a\colon \mathbf{void} \ *, \ \S 44. \\ spec\_write\colon \mathbf{extern} \ \mathbf{void} \ (\ ), \\ \S 208. \\ stage\colon \mathbf{int}, \ \S 23. \\ startup\colon \mathbf{static} \ \mathbf{void} \ (\ ), \ \S 31. \\ stunc = 67, \ \S 49. \\ sync = 79, \ \S 49. \\ \mathbf{tetra} = \mathbf{unsigned} \ \mathbf{int}, \ \S 17. \\ ticks\colon \mathbf{Extern} \ \mathbf{octa}, \ \S 87. \\ write\_from\_wbuf = 92, \ \S 129. \\ \end{array}
```

**254.** The write\_search routine looks to see if any instructions ahead of a given place in the mem list of the reorder buffer are storing into a given physical address, or if there's a pending instruction in the write buffer for that address. If so, it returns a pointer to the value to be written. If not, it returns  $\Lambda$ . If the answer is currently unknown, because at least one possibly relevant physical address has not yet been computed, the subroutine returns the special code value DUNNO.

The search starts at the x.up field of a control block for a store instruction, otherwise at the  $ptr_a$  field of the control block, unless  $ptr_a$  points to a committed instruction.

The *i* field in the write buffer is usually st or pst, inherited from a store or partial store command. It may also be sync (from SYNC 1 or SYNC 3) or stunc (from STUNC).

```
#define DUNNO ((octa *) 1)
                                             /* an impossible non-\Lambda pointer */
\langle Internal prototypes 13 \rangle + \equiv
   static octa *write_search ARGS((control *, octa));
255. \langle Subroutines 14\rangle + \equiv
   static octa *write_search(ctl, addr)
         control *ctl;
         octa addr;
   { register specnode *p = (ctl \neg mem\_x ? ctl \neg x.up : (specnode *) ctl \neg ptr\_a);}
      register write_node *q = write\_tail;
      addr.l \&= -8;
      if (p \equiv \& mem) goto gloop;
                                                              /* already committed */
      if (p > \& hot \neg x \land ctl < hot) goto gloop;
      if (p < \&ctl \rightarrow x \land (ctl \leq hot \lor p > \&hot \rightarrow x)) goto qloop;
      for (; p \neq \&mem; p = p \rightarrow up) {
         if (p \rightarrow addr.h \equiv (tetra) - 1) return DUNNO;
         if ((p \rightarrow addr.l \& -8) \equiv addr.l \land p \rightarrow addr.h \equiv addr.h)
            return (p \rightarrow known ? \& (p \rightarrow o) : DUNNO);
   qloop: for (;;) {
         if (q \equiv write\_head) return \Lambda;
         if (q \equiv wbuf\_top) q = wbuf\_bot; else q++;
         if (q \rightarrow addr.l \equiv addr.l \land q \rightarrow addr.h \equiv addr.h) return &(q \rightarrow o);
      }
   }
```

**256.** When we're committing new data to memory, we can update an existing item in the write buffer if it has the same physical address, unless that item is already in the process of being written out. Increasing the value of *holding\_time* will increase the chance that this economy is possible, but it will also increase the number of buffered items when writes are to different locations.

A store instruction that sets any of the eight interrupt bits rwxnkbsp will not affect memory, even if it doesn't cause an interrupt.

When "store" is followed by "store uncached" at the same address, or vice versa, we believe the most recent hint.

```
\langle Commit to memory if possible, otherwise break 256 \rangle \equiv { register write_node *q = write_tail;
```

```
if (hot¬interrupt & (F_BIT + #ff)) goto done_with_write;
                       if (hot→x.addr.h & #ffff0000) {
                                  if (hot \neg op > STB \land hot \neg op < STSF) q \neg size = (hot \neg op \& \#f) \gg 2;
                                  else if (hot \neg op > STSF \land hot \neg op < STCO) q \neg size = 2;
                                  else q \rightarrow size = 3;
                      if (hot \rightarrow i \neq sync)
                                  for (;;) {
                                             if (q \equiv write\_head) break;
                                             if (q \equiv wbuf\_top) q = wbuf\_bot; else q +++;
                                             if (q \rightarrow i \equiv sync) break;
                                             if (q \rightarrow addr.l \equiv hot \rightarrow x.addr.l \land q \rightarrow addr.h \equiv hot \rightarrow x.addr.h \land (q \neq addr.h \neq addr
                                                                                 write\_head \lor \neg wbuf\_lock)) goto addr\_found;
                        { register write_node *p = (write\_tail \equiv wbuf\_bot ? wbuf\_top : write\_tail - 1);}
                                  if (p \equiv write\_head) break;
                                                                                                                                                                                       /* the write buffer is full */
                                  q = write\_tail; write\_tail = p;
                                  q \rightarrow addr = hot \rightarrow x.addr;
            addr\_found: q \rightarrow o = hot \rightarrow x.o;
                       q \rightarrow stamp = ticks.l;
                       q \rightarrow i = hot \rightarrow i;
            done\_with\_write: spec\_rem(\&(hot \rightarrow x));
                        mem\_slots ++;
This code is used in section 146.
```

```
addr: octa, §246.
                                    mem_x: bool, §44.
                                                                        stunc = 67, \S 49.
addr: \mathbf{octa}, \S 40.
                                   o: octa, §246.
                                                                        sync = 79, \S 49.
                                   o: octa, \S 40.
ARGS = macro, \S 6.
                                                                        tetra = unsigned int, §17.
control = struct, §44.
                                   octa = struct, §17.
                                                                        ticks: Extern octa, §87.
{\tt F\_BIT} = 1 \ll 17, \, \S 54.
                                    op: mmix_opcode, §44.
                                                                        up: specnode *, §40.
h: tetra, §17.
                                    pst = 66, \S 49.
                                                                        wbuf\_bot: write_node *, §247.
holding_time: int, §247.
                                    ptr_a: void *, §44.
                                                                        wbuf_lock: lockvar, §247.
hot: \mathbf{control} *, \S 60.
                                    size: int, §246.
                                                                        wbuf\_top: write\_node *, §247.
i: internal_opcode, §246.
                                   spec_rem: static void (), §97.
                                                                        write_head: write_node *,
i: internal_opcode, §44.
                                   specnode = struct, \S 40.
                                                                          §247.
interrupt: unsigned int, §44.
                                    st = 63, \S 49.
                                                                        write\_node = struct, \S 246.
                                    stamp: tetra, §246.
known: bool, §40.
                                                                        write_tail: write_node *,
                                   STB = \#a0, \S47.
l: tetra, §17.
                                                                         §247.
mem: specnode, \S 115.
                                   STCO = {}^{\#}b4, \S 47.
                                                                       x: specnode, §44.
mem\_slots: int, §86.
                                   STSF = \#b0, \S47.
```

**257.** A special coroutine whose duty is to empty the write buffer is always active. It holds the *wbuf\_lock* while it is writing the contents of *write\_head*. It holds *Dcache-fill\_lock* while waiting for the D-cache to fill a block.

```
\langle Cases for control of special coroutines 126\rangle + \equiv
case write\_from\_wbuf: p = (cacheblock *) data \rightarrow ptr\_b;
  switch (data¬state) {
  case 4: (Forward the new data past the D-cache if it is write-through 263);
     data \rightarrow state = 5;
  case 5: if (write\_head \equiv wbuf\_bot) write\_head = wbuf\_top; else write\_head --;
  write\_restart: data \neg state = 0;
  case 0: if (self \neg lockloc) *(self \neg lockloc) = \Lambda, self \neg lockloc = \Lambda;
     if (write\_head \equiv write\_tail) wait(1);
                                                   /* write buffer is empty */
     if (write\_head \neg i \equiv sync) (Ignore the item in write\_head 264);
     if (write_head→addr.h & #fffff0000) goto mem_direct;
     if (ticks.l - write\_head \neg stamp < holding\_time \land \neg speed\_lock) wait(1);
           /* data too raw */
     if (\neg Dcache) goto mem\_direct;
                                               /* not cached */
     if (Dcache \neg lock \lor (j = qet\_reader(Dcache) < 0)) wait(1);
                                                                             /* D-cache busy */
     startup(\&Dcache \rightarrow reader[j], Dcache \rightarrow access\_time);
     \langle Write the data into the D-cache and set state = 4, if there's a cache hit 262\rangle;
     data \rightarrow state = ((Dcache \rightarrow mode \& WRITE\_ALLOC) \land write\_head \rightarrow i \neq stunc ? 1:3);
     wait(Dcache \rightarrow access\_time);
  case 1: (Try to put the contents of location write_head¬addr into the D-cache 261);
     data \rightarrow state = 2; sleep;
  case 2: data \neg state = 0; sleep; /* wake up when the D-cache has the block */
  case 3: (Handle write-around when writing to the D-cache 259);
  mem_direct: \langle Write directly from write_head to memory 260 \rangle;
  }
258.
        \langle \text{Local variables } 12 \rangle + \equiv
  register cacheblock *p, *q;
        The granularity is guaranteed to be 8 in write-around mode (see MMIX_config).
Although an uncached store will not be stored in the D-cache (unless it hits in the
D-cache), it will go into a secondary cache.
\langle Handle write-around when writing to the D-cache 259\rangle \equiv
  if (Dcache \neg flusher.next) wait (1);
  Dcache \neg outbuf.tag.h = write\_head \neg addr.h;
```

if  $(Dcache \neg flusher.next)$  wait (1);  $Dcache \neg outbuf.tag.h = write\_head \neg addr.h$ ;  $Dcache \neg outbuf.tag.l = write\_head \neg addr.l & (-Dcache \neg bb)$ ; for  $(j = 0; j < Dcache \neg bb) \gg Dcache \neg g; j++)$   $Dcache \neg outbuf.dirty[j] = false$ ;  $Dcache \neg outbuf.data[(write\_head \neg addr.l & (Dcache \neg bb - 1)) \gg 3] = write\_head \neg g;$   $Dcache \neg outbuf.dirty[(write\_head \neg addr.l & (Dcache \neg bb - 1)) \gg Dcache \neg g] = true$ ;  $set\_lock(self, wbuf\_lock)$ ;  $startup(\&Dcache \neg flusher, Dcache \neg copy\_out\_time)$ ;  $data \neg state = 5$ ;  $wait(Dcache \neg copy\_out\_time)$ ; This code is used in section 257.

```
260. ⟨Write directly from write_head to memory 260⟩ ≡
if (mem_lock) wait(1);
set_lock(self, wbuf_lock);
set_lock(&mem_locker, mem_lock); /* a coroutine of type vanish */
startup(&mem_locker, mem_addr_time + mem_write_time);
if (write_head¬addr.h & #ffff0000)
spec_write(write_head¬addr, write_head¬o, write_head¬size);
else mem_write(write_head¬addr, write_head¬o);
data¬state = 5; wait(mem_addr_time + mem_write_time);
This code is used in section 257.
```

**261.** A subtlety needs to be mentioned here: While we're trying to update the D-cache, another instruction might be filling the same cache block (although not because of the same physical address). Therefore we **goto** write\_restart here instead of saying wait(1).

```
⟨ Try to put the contents of location write_head¬addr into the D-cache 261⟩ ≡
if (Dcache¬filler.next) goto write_restart;
if ((Scache ∧ Scache¬lock) ∨ (¬Scache ∧ mem_lock)) goto write_restart;
p = alloc_slot(Dcache, write_head¬addr);
if (¬p) goto write_restart;
if (Scache) set_lock(&Dcache¬filler, Scache¬lock)
else set_lock(&Dcache¬filler, mem_lock);
set_lock(self, Dcache¬filler, mem_lock);
set_lock(self, Dcache¬filler_ctl.ptr_b = (void *) p;
Dcache¬filler_ctl.z.o = write_head¬addr;
startup(&Dcache¬filler, Scache ? Scache¬access_time : mem_addr_time);
```

This code is used in section 257.

```
access_time: int, §167.
addr: octa, §246.
alloc_slot: static cacheblock
  *(), §205.
bb: int, §167.
cacheblock = struct, \S 167.
copy\_out\_time: int, §167.
data: register control *,
  §124.
data: octa *, §167.
Dcache: cache *, §168.
dirty: char *, §167.
false = 0, §11.
fill_lock: lockvar, §167.
filler: coroutine, \S 167.
filler_ctl: control, §167.
flusher: coroutine, §167.
g: int, §167.
qet_reader: static int (), §183.
h: tetra, §17.
holding_time: int, §247.
i: internal_opcode, §246.
j: register int, §12.
l: tetra, §17.
```

```
lock: lockvar, §167.
lockloc: coroutine **, §23.
mem\_addr\_time: int, §214.
mem_lock: lockvar, §214.
mem_locker: coroutine, §127.
mem\_write: void (), §213.
mem\_write\_time: int, §214.
MMIX\_config: void (),
 MMIX-CONFIG §38.
mode: int, §167.
next: coroutine *, §23.
o: octa, §246.
o: octa, §40.
outbuf: cacheblock, §167.
p: register write_node *,
  §256.
ptr_b: void *, §44.
reader: coroutine *, §167.
Scache: cache *, §168.
self: register coroutine *,
  §124.
set\_lock = macro(), §37.
size: int, §246.
sleep = macro, \S 125.
```

```
spec_write: extern void (),
  §208.
speed_lock: lockvar, §247.
stamp: tetra, §246.
startup: static void (), §31.
state: int, §44.
stunc = 67, \S 49.
sync = 79, \S 49.
tag: octa, §167.
ticks: Extern octa, §87.
true = 1, \S 11.
vanish = 98, \S 129.
wait = macro(), \S 125.
wbuf\_bot: write_node *, §247.
wbuf\_lock: lockvar, §247.
wbuf\_top: write\_node *, §247.
WRITE_ALLOC = 2, §166.
write\_from\_wbuf = 92, \S 129.
write_head: write_node *,
  §247.
write_tail: write_node *,
  §247.
z: spec, \S44.
```

**262.** Here it is assumed that *Dcache-access\_time* is enough to search the D-cache and update one octabyte in case of a hit. The D-cache is not locked, since other coroutines that might be simultaneously reading the D-cache are not going to use the octabyte that changes. Perhaps the simulator is being too lenient here.

```
\langle Write the data into the D-cache and set state = 4, if there's a cache hit 262 \rangle \equiv
   p = cache\_search(Dcache, write\_head \neg addr);
   if (p) {
      p = use\_and\_fix(Dcache, p);
      set_lock(self, wbuf_lock);
      data \neg ptr_{\bullet}b = (\mathbf{void} *) p:
      p \rightarrow data[(write\_head \rightarrow addr.l \& (Dcache \rightarrow bb - 1)) \gg 3] = write\_head \rightarrow o;
      p \rightarrow dirty[(write\_head \rightarrow addr.l \& (Dcache \rightarrow bb - 1)) \gg Dcache \rightarrow g] = true;
      data \neg state = 4; wait(Dcache \neg access\_time);
   }
This code is used in section 257.
          \langle Forward the new data past the D-cache if it is write-through 263\rangle \equiv
   if ((Dcache \neg mode \& WRITE\_BACK) \equiv 0) { /* write-through */
      if (Dcache \neg flusher.next) wait(1);
      flush\_cache(Dcache, p, true);
This code is used in section 257.
          \langle \text{ Ignore the item in } write\_head | 264 \rangle \equiv
      set_lock(self, wbuf_lock);
      data \rightarrow state = 5;
      wait(1);
This code is used in section 257.
```

access\_time: int, §167.
addr: octa, §246.
bb: int, §167.
cache\_search: static
cacheblock \*(), §193.
data: octa \*, §167.
data: register control \*,
§124.
Dcache: cache \*, §168.
dirty: char \*, §167.
flush\_cache: static void (),

§203.
flusher: coroutine, §167.
g: int, §167.
l: tetra, §17.
mode: int, §167.
next: coroutine \*, §23.
o: octa, §246.
p: register cacheblock \*

o: octa, §246.
p: register cacheblock \*,
§258.
ptr\_b: void \*, §44.
self: register coroutine \*,

 $\begin{array}{l} \S 124.\\ set\_lock = \mathrm{macro}\;(\;),\; \S 37.\\ state\colon\; \mathbf{int},\; \S 44.\\ true = 1,\; \S 11.\\ use\_and\_fix\colon\; \mathbf{static}\\ \mathbf{cacheblock}\;\; \ast(\;),\; \S 196.\\ wait = \mathrm{macro}\;(\;),\; \S 125.\\ wbuf\_lock\colon\; \mathbf{lockvar},\; \S 247.\\ write\_back = 1,\; \S 166.\\ write\_head\colon\; \mathbf{write\_node}\;\; \ast,\; \S 247.\\ \end{array}$ 

**265.** Loading and storing. A RISC machine is often said to have a "load/store architecture," perhaps because loading and storing are among the most difficult things a RISC machine is called upon to do.

We want memory accesses to be efficient, so we try to access the D-cache at the same time as we are translating a virtual address via the DT-cache. Usually we hit in both caches, but numerous cases must be dealt with when we miss. Is there an elegant way to handle all the contingencies? Alas, the author of this program was unable to think of anything better than to throw lots of code at the problem — knowing full well that such a spaghetti-like approach is fraught with possibilities for error.

Instructions like LDO x, y, z operate in two pipeline stages. The first stage computes the virtual address y + z, waiting if necessary until y and z are both known; then it starts to access the necessary caches. In the second stage we ascertain the corresponding physical address and hopefully find the data in the cache (or in the speculative mem list or the write buffer).

An instruction like STB x, y, z shares some of the computation of LDO x, y, z, because only one byte is being stored but the other seven bytes must be found in the cache. In this case, however, x is treated as an input, and mem is the output. The second stage of a store command can begin even though x is not known during the first stage.

Here's what we do at the beginning of stage 1.

```
#define ld_st_launch 7
              /* state when load/store command has its memory address */
\langle Cases to compute the virtual address of a memory operation 265\rangle \equiv
case preld: case prest: case prego:
   data \neg z.o = incr(data \neg z.o, data \neg xx \& -(data \neg i \equiv prego ? Icache : Dcache) \neg bb);
     /* (I hope the adder is fast enough) */
case ld: case ldunc: case ldvts: case st: case syncd: case syncd: case syncid:
  start\_ld\_st: data \rightarrow y.o = oplus(data \rightarrow y.o, data \rightarrow z.o);
   data \rightarrow state = ld\_st\_launch; goto switch1;
case ldptp: case ldpte: if (data-y.o.h) goto start_ld_st;
   data \rightarrow x.o = zero\_octa; data \rightarrow x.known = true; goto die;
                                                                          /* page table fault */
This code is used in section 132.
266.
         #define PRW_BITS (data \rightarrow i < st ? PR_BIT : data \rightarrow i \equiv pst ? PR_BIT + PW_BIT :
                 (data \rightarrow i \equiv syncid \land (data \rightarrow loc.h \& sign\_bit)) ? 0 : PW\_BIT)
\langle Special cases for states in the first stage 266 \rangle \equiv
case ld\_st\_launch: if ((self + 1) \rightarrow next) wait(1);
                                                           /* second stage must be clear */
   (Handle special cases for operations like prego and ldvts 289);
  if (data¬y.o.h & sign_bit) ⟨Do load/store stage 1 with known physical address 271⟩;
  if (page_bad) {
     if (data \neg i < preld \lor data \neg i \equiv st \lor data \neg i \equiv pst) data \neg interrupt |= PRW_BITS;
     goto fin_ex;
  if (DTcache \rightarrow lock \lor (j = qet\_reader(DTcache)) < 0) wait(1);
   startup(\&DTcache \rightarrow reader[j], DTcache \rightarrow access\_time);
   (Look up the address in the DT-cache, and also in the D-cache if possible 267);
```

```
pass_after(DTcache¬access_time); goto passit;
See also sections 310, 326, 360, and 363.
This code is used in section 130.
```

**267.** When stage 2 of a load/store command begins, the state will depend on what transpired in stage 1. For example, *data¬state* will be *DT\_miss* if the virtual address key can't be found in the DT-cache; then stage 2 will have to compute the physical address the hard way.

The data¬state will be DT\_hit if the physical address is known via the DT-cache, but the data may or may not be in the D-cache. The data¬state will be hit\_and\_miss if the DT-cache hits and the D-cache doesn't. And data¬state will be ld\_ready if data¬x.o is the desired octabyte (for example, if both caches hit).

```
#define DT_miss 10
                              /* second stage state when DT-cache doesn't hold the key */
#define DT_hit 11
                            /* second stage state when physical address is known */
#define hit_and_miss
                         12
                                  /* second stage state when D-cache misses */
                             /* second stage state when data has been read */
#define ld_ready 13
#define st_ready 14
                             /* second stage state when data needn't be read */
#define prest_win 15
                              /* second stage state when we can fill a block with zeroes */
\langle Look up the address in the DT-cache, and also in the D-cache if possible 267\rangle
  p = cache\_search(DTcache, trans\_key(data \rightarrow y.o));
  if (\neg Dcache \lor Dcache \neg lock \lor (j = get\_reader(Dcache)) < 0 \lor (data \neg i \ge st \land data \neg i \le syncid))
     (Do load/store stage 1 without D-cache lookup 270);
  startup(\&Dcache \rightarrow reader[j], Dcache \rightarrow access\_time);
  if (p) (Do a simultaneous lookup in the D-cache 268)
  else data \rightarrow state = DT\_miss;
```

This code is used in section 266.

```
reader: coroutine *, §167.
access_time: int, §167.
                                    ldptp = 57, \S 49.
bb: int, §167.
                                    ldunc = 59, \S 49.
                                                                       self: register coroutine *,
cache_search: static
                                    ldvts = 60, \S 49.
                                                                          §124.
  cacheblock *(), §193.
                                    loc: octa, §44.
                                                                       sign\_bit = macro, \S 80.
data: register control *,
                                    lock: lockvar, §167.
                                                                       st = 63, \, 849.
  §124.
                                    mem: \mathbf{specnode}, \S 115.
                                                                       startup: static void (), §31.
Dcache: cache *, §168.
                                    next: coroutine *, §23.
                                                                       state: int, §44.
die: label, §144.
                                   o: octa, §40.
                                                                       switch1: label, §130.
DTcache: cache *, §168.
                                    oplus: octa (), MMIX-ARITH §5.
                                                                       syncd = 64, §49.
fin_ex: label, §144.
                                   p: register cacheblock *,
                                                                       syncid = 65, §49.
                                                                       trans\_key = macro(), \S 240.
get_reader: static int (), §183.
                                      §258.
                                   page_bad: bool, §238.
h: tetra, §17.
                                                                       true = 1, \S 11.
i: internal_opcode, §44.
                                    pass\_after = macro(), \S 125.
                                                                       wait = macro(), \S 125.
Icache: cache *, §168.
                                   passit: label, §134.
                                                                       x: specnode, §44.
incr: octa (), MMIX-ARITH §6.
                                   PR_BIT = 1 \ll 7, \S 54.
                                                                       xx: unsigned char, §44.
                                   prego = 73, \S 49.
interrupt: unsigned int, §44.
                                                                       y: spec, §44.
j: register int, §12.
                                   preld = 61, \S 49.
                                                                       z: spec, §44.
                                                                       zero_octa: octa,
known: bool, §40.
                                   prest = 62, \S 49.
                                   pst = 66, \S 49.
ld = 56, \S 49.
                                                                         MMIX-ARITH §4.
ldpte = 58, §49.
                                   PW_BIT = 1 \ll 6, \S 54.
```

**268.** We assume that it is possible to look up a virtual address in the DT-cache at the same time as we look for a corresponding physical address in the D-cache, provided that the lower b+c bits of the two addresses are the same. (They will always be the same if  $b+c \leq page\_s$ ; otherwise the operating system can try to make them the same by "page coloring" whenever possible.) If both caches hit, the physical address is known in  $\max(DTcache\_access\_time, Dcache\_access\_time)$  cycles.

If the lower b+c bits of the virtual and physical addresses differ, the machine will not know this until the DT-cache has hit. Therefore we simulate the operation of accessing the D-cache, but we go to  $DT\_hit$  instead of to  $hit\_and\_miss$  because the D-cache will experience a spurious miss.

```
#define max(x,y) ((x) < (y) ? (y) : (x))
\langle Do a simultaneous lookup in the D-cache 268\rangle \equiv
   \{ \mathbf{octa} * m :
      p = use\_and\_fix(DTcache, p), data \neg z.o = p \neg data[0];
      (Check the protection bits and get the physical address 269);
      m = write\_search(data, data \neg z.o);
      if (m \equiv DUNNO) data \rightarrow state = DT\_hit;
      else if (m) data \rightarrow x.o = *m, data \rightarrow state = ld\_ready;
      else if (Dcache \rightarrow b + Dcache \rightarrow c > page\_s \land
                ((data \rightarrow y.o.l \oplus data \rightarrow z.o.l) \& ((Dcache \rightarrow bb \ll Dcache \rightarrow c) - (1 \ll page\_s))))
          data \rightarrow state = DT\_hit;
                                            /* spurious D-cache lookup */
      else {
          q = cache\_search(Dcache, data \neg z.o);
          if (q) {
             if (data \rightarrow i \equiv ldunc) q = demote\_and\_fix(Dcache, q);
             else q = use\_and\_fix(Dcache, q);
             data \rightarrow x.o = q \rightarrow data[(data \rightarrow z.o.l \& (Dcache \rightarrow bb - 1)) \gg 3];
             data \rightarrow state = ld\_ready;
          } else data \rightarrow state = hit\_and\_miss;
      pass\_after(max(DTcache \rightarrow access\_time, Dcache \rightarrow access\_time));
      goto passit;
```

This code is used in section 267.

The protection bits  $p_r p_w p_x$  in a translation cache are shifted four positions right from the interrupt codes PR\_BIT, PW\_BIT, PX\_BIT. If the data is protected, we abort the load/store operation immediately; this protects the privacy of other users.  $\langle$  Check the protection bits and get the physical address 269 $\rangle \equiv$ **if** (data→stack\_alert) { if  $(data \neg z.o.l \& (PW\_BIT \gg PROT\_OFFSET))$   $data \neg stack\_alert = false;$ else  $data \neg z.o = q[rC].o$ ; /\* use the continuation page for stack overflow \*/  $j = PRW_BITS;$ if  $(((data \rightarrow z.o.l \ll PROT\_OFFSET) \& j) \neq j)$  { if  $(data \rightarrow i \equiv syncd \lor data \rightarrow i \equiv syncid)$  goto  $sync\_check$ ; if  $(data \rightarrow i \neq preld \land data \rightarrow i \neq prest)$  $data \rightarrow interrupt \mid = i \& \sim (data \rightarrow z.o.l \ll PROT\_OFFSET);$  $data \rightarrow stack\_alert = false$ ; **goto**  $fin_{-}ex$ ;  $data \rightarrow z.o = phys\_addr(data \rightarrow y.o, data \rightarrow z.o);$ This code is used in sections 268, 270, and 272. **270.** (Do load/store stage 1 without D-cache lookup 270)  $\equiv$  $\{$  **octa** \*m; **if** (p) {  $p = use\_and\_fix(DTcache, p), data \neg z.o = p \neg data[0];$ (Check the protection bits and get the physical address 269); if  $(data \rightarrow i \geq st \land data \rightarrow i \leq syncid)$   $data \rightarrow state = st\_ready;$ else {  $m = write\_search(data, data \neg z.o);$ if  $(m \land m \neq DUNNO)$  data $\neg x.o = *m, data \neg state = ld\_ready;$ else  $data \rightarrow state = DT\_hit$ ; } else  $data \rightarrow state = DT_miss$ ;

This code is used in section 267.

pass\_after(DTcache→access\_time); goto passit;

```
access_time: int, §167.
                                                                        PW_BIT = 1 \ll 6, \S 54.
                                    i: internal_opcode, §44.
b: int, §167.
                                    interrupt: unsigned int, §44.
                                                                        PX_BIT = 1 \ll 5, \S 54.
bb: int, §167.
                                    j: register int, §12.
                                                                        q: register cacheblock *,
c: int, §167.
                                    l: tetra, §17.
                                                                          §258.
cache_search: static
                                    ld\_ready = 13, \S 267.
                                                                        rC = 8, \S 52.
                                    ldunc = 59, \S 49.
                                                                        st = 63, \S 49.
  cacheblock *(), §193.
data: octa *, §167.
                                    o: octa, §40.
                                                                        st\_ready = 14, \S 267.
data: register control *,
                                    octa = struct, \S 17.
                                                                        stack_alert: bool, §44.
  §124.
                                    p: register cacheblock *,
                                                                        state: int, \S 44.
                                      \S 258.
Dcache: cache *, §168.
                                                                        sync_check: label, §370.
demote_and_fix: static
                                    page_s: int, §238.
                                                                        syncd = 64, \S 49.
                                                                        syncid = 65, \S 49.
  cacheblock *(), §199.
                                    pass\_after = macro(), \S 125.
DT_hit = 11, \S 267.
                                    passit: label, §134.
                                                                        use\_and\_fix: static
DT_{-}miss = 10, \S 267.
                                    phys_addr: static octa (),
                                                                          cacheblock *(), §196.
DTcache: cache *, §168.
                                                                        write_search: static octa *(),
                                      §241.
DUNNO = macro, \S 254.
                                   PR_BIT = 1 \ll 7, \S 54.
                                                                          \S 255.
false = 0, \S 11.
                                    preld = 61, \S 49.
                                                                        x: specnode, §44.
fin_ex: label, §144.
                                    prest = 62, \S 49.
                                                                        y: spec, §44.
                                    PROT_OFFSET = 5, §54.
g: int, §167.
                                                                        z: spec, \S44.
hit\_and\_miss = 12, \S 267.
                                   PRW_BITS = macro, \S 266.
```

```
271. (Do load/store stage 1 with known physical address 271) \equiv
   \{ \text{ octa } *m; 
      if (\neg(data \neg loc.h \& sign\_bit)) {
         if (data \rightarrow i \equiv syncd \lor data \rightarrow i \equiv syncid) goto sync\_check;
         if (data \rightarrow i \neq preld \land data \rightarrow i \neq prest) data \rightarrow interrupt = N_BIT;
         goto fin_ex:
      data \neg z.o = data \neg y.o; data \neg z.o.h -= sign\_bit;
      if (data→z.o.h & #ffff0000) {
         switch (data→i) {
         case ldvts: case preld: case prest: case prego: case syncd: case syncid:
            goto fin_{-}ex:
         case ld: case ldunc: if (mem_lock) wait(1);
            if (data \rightarrow op < LDSF) i = (data \rightarrow op \& \#f) \gg 2;
            else if (data \rightarrow op < CSWAP) i = 2;
            else i = 3:
            data \rightarrow x.o = spec\_read(data \rightarrow z.o, i);
            goto make_ld_ready;
         case pst:
            if ((data \rightarrow op \oplus CSWAP) < 1) {
               data \rightarrow x.o = spec\_read(data \rightarrow z.o, 3); goto make\_ld\_ready;
            data \rightarrow x.o = zero\_octa;
         case st: data \neg state = st\_ready; pass\_after(1); goto passit;
      } else if (data \rightarrow i > st \land data \rightarrow i < syncid) {
         data \rightarrow state = st\_ready; pass\_after(1); goto passit;
      m = write\_search(data, data \neg z.o);
      if (m) {
         if (m \equiv DUNNO) data \rightarrow state = DT\_hit;
         else data \rightarrow x.o = *m, data \rightarrow state = ld\_ready;
         pass_after(1); goto passit;
      \} else if (\neg Dcache) \{
         if (mem\_lock) wait(1);
         data \rightarrow x.o = mem\_read(data \rightarrow z.o);
      make_ld_ready: set_lock(&mem_locker, mem_lock);
         data \rightarrow state = ld\_ready;
         startup(\&mem\_locker, mem\_addr\_time + mem\_read\_time);
         pass_after(mem_addr_time + mem_read_time); goto passit;
     if (Dcache \neg lock \lor (j = get\_reader(Dcache)) < 0) {
         data \rightarrow state = DT\_hit; pass\_after(1); goto passit;
      startup(\&Dcache \rightarrow reader[j], Dcache \rightarrow access\_time);
      q = cache\_search(Dcache, data \neg z.o);
      if (q) {
         if (data \rightarrow i \equiv ldunc) q = demote\_and\_fix(Dcache, q);
         else q = use\_and\_fix(Dcache, q);
```

```
\begin{array}{l} data \neg x.o = q \neg data [(data \neg z.o.l \& (Dcache \neg bb - 1)) \gg 3]; \\ data \neg state = ld\_ready; \\ \} \ \ \textbf{else} \ \ data \neg state = hit\_and\_miss; \\ pass\_after(Dcache \neg access\_time); \ \ \textbf{goto} \ \ passit; \\ \} \\ \text{This code is used in section 266.} \end{array}
```

```
access_time: int, §167.
bb: int, §167.
cache_search: static
  cacheblock *(), §193.
CSWAP = #94, \S47.
data: octa *, §167.
data: register control *,
  §124.
Dcache: cache *, §168.
demote_and_fix: static
  cacheblock *(), §199.
DT_hit = 11, \S 267.
DUNNO = macro, \S 254.
fin_ex: label, §144.
get_reader: static int (), §183.
h: tetra, §17.
hit\_and\_miss = 12, \S 267.
i: internal_opcode, §44.
interrupt: unsigned int, §44.
j: register int, §12.
l: tetra, §17.
ld = 56, \S 49.
ld_{ready} = 13, \S 267.
```

```
LDSF = ^{\#}90, \S47.
ldunc = 59, \S 49.
ldvts = 60, \S 49.
loc: octa, §44.
lock: lockvar, §167.
mem\_addr\_time: int, \S 214.
mem_lock: lockvar, §214.
mem_locker: coroutine, §127.
mem_read: octa (), §210.
mem\_read\_time: int, \S 214.
N_BIT = 1 \ll 4, \S 54.
o: octa, §40.
octa = struct, §17.
op: mmix_opcode, §44.
pass\_after = macro(), \S 125.
passit: label, §134.
prego = 73, \S 49.
preld = 61, \S 49.
prest = 62, \S 49.
pst = 66, \S 49.
q: register cacheblock *,
  §258.
```

```
reader: coroutine *, §167.
set\_lock = macro(), \S 37.
sign\_bit = macro, \S 80.
spec_read: extern octa (),
  §208.
st = 63, \S 49.
st\_ready = 14, \S 267.
startup: static void (), §31.
state: int, §44.
sync_check: label, §370.
syncd = 64, \S 49.
syncid = 65, §49.
use_and_fix: static
  cacheblock *(), §196.
wait = macro(), \S 125.
write_search: static octa *(),
  §255.
x: specnode, §44.
y: spec, §44.
z: spec, §44.
zero_octa: octa,
  MMIX-ARITH §4.
```

**272.** The program for the second stage is, likewise, rather long-winded, yet quite similar to the cache manipulations we have already seen several times.

Several instructions might be trying to fill the DT-cache for the same page. (A similar situation faced us in the *write\_from\_wbuf* coroutine.) The second stage therefore needs to do some translation cache searching just as the first stage did. In this stage, however, we don't go all out for speed, because DT-cache misses are rare.

```
#define DT_retry 8
               /* second stage state when DT-cache should be searched again */
#define qot_DT 9
               /* second stage state when DT-cache entry has been computed */
\langle Special cases for states in later stages 272\rangle \equiv
square\_one: data \rightarrow state = DT\_retry;
case DT_retry: if (DTcache \neg lock \lor (j = get\_reader(DTcache)) < 0) wait(1):
   startup(\&DTcache \rightarrow reader[j], DTcache \rightarrow access\_time);
   p = cache\_search(DTcache, trans\_key(data \rightarrow y.o));
   if (p) {
     p = use\_and\_fix(DTcache, p), data \neg z.o = p \neg data[0];
      (Check the protection bits and get the physical address 269);
      if (data \rightarrow i \geq st \land data \rightarrow i \leq syncid) data \rightarrow state = st\_ready;
      else data \rightarrow state = DT - hit;
   } else data \rightarrow state = DT_miss;
   wait(DTcache \neg access\_time);
case DT\_miss: if (DTcache \neg filler.next)
      if (data \rightarrow i \equiv preld \lor data \rightarrow i \equiv prest) goto fin_ex; else goto square\_one;
   if (no\_hardware\_PT \lor page\_f)
      if (data \neg i \equiv preld \lor data \neg i \equiv prest) goto fin_ex; else goto emulate\_virt;
   p = alloc\_slot(DTcache, trans\_key(data \rightarrow y.o));
   if (\neg p) goto square_one;
   data \rightarrow ptr_b = DTcache \rightarrow filler\_ctl.ptr_b = (\mathbf{void} *) p;
   DTcache \neg filler\_ctl.y.o = data \neg y.o;
   set\_lock(self, DTcache \rightarrow fill\_lock);
   startup(\&DTcache \rightarrow filler, 1);
   data \rightarrow state = got\_DT;
   if (data \rightarrow i \equiv preld \lor data \rightarrow i \equiv prest) goto fin_ex; else sleep;
case got_DT: release_lock(self, DTcache→fill_lock);
   (Check the protection bits and get the physical address 269);
   if (data \rightarrow i > st \land data \rightarrow i < syncid) goto finish_store;
         /* otherwise we fall through to ld_retry below */
See also sections 273, 276, 279, 280, 299, 311, 354, 364, and 370.
This code is used in section 135.
```

**273.** The second stage might also want to fill the D-cache (and perhaps the S-cache) as we get the data.

Several load instructions might be trying to fill the same cache block. So we should go back and look in the D-cache again if we miss and cannot allocate a slot immediately.

A PRELD or PREST instruction, which is just a "hint," doesn't do anything more if the caches are already busy.

```
\langle Special cases for states in later stages 272\rangle + \equiv
ld\_retry: data \rightarrow state = DT\_hit;
case DT_hit: if (data \neg i \equiv preld \lor data \neg i \equiv prest) goto fin\_ex;
   (Check for a hit in pending writes 278);
   if ((data \rightarrow z.o.h \& \#ffff0000) \lor \neg Dcache)
       (Do load/store stage 2 without D-cache lookup 277);
   if (Dcache \neg lock \lor (j = qet\_reader(Dcache)) < 0) wait(1);
   startup(\&Dcache \rightarrow reader[j], Dcache \rightarrow access\_time);
   q = cache\_search(Dcache, data \rightarrow z.o);
   if (q) {
      if (data \neg i \equiv ldunc) q = demote\_and\_fix(Dcache, q);
      else q = use\_and\_fix(Dcache, q);
      data \rightarrow x.o = q \rightarrow data[(data \rightarrow z.o.l \& (Dcache \rightarrow bb - 1)) \gg 3];
      data \rightarrow state = ld\_ready;
   } else data \rightarrow state = hit\_and\_miss;
   wait(Dcache \rightarrow access\_time);
case hit\_and\_miss: if (data \neg i \equiv ldunc) goto avoid\_D;
   \langle \text{Try to get the contents of location } data \neg z.o \text{ in the D-cache } 274 \rangle;
```

```
access_time: int, §167.
                                    finish_store: label, §280.
                                                                          §258.
alloc_slot: static cacheblock
                                    get_reader: static int (), §183.
                                                                       reader: coroutine *, §167.
  *(), §205.
                                    h: tetra, §17.
                                                                        release\_lock = macro(), §37.
avoid_D: label, §277.
                                    hit\_and\_miss = 12, \S 267.
                                                                       self: register coroutine *,
bb: int, §167.
                                    i: internal_opcode, §44.
                                                                          ξ124.
cache_search: static
                                    j: register int, §12.
                                                                       set\_lock = macro(), \S 37.
  cacheblock *(), §193.
                                    l: tetra, §17.
                                                                       sleep = macro, \S 125.
data: octa *, §167.
                                    ld_{ready} = 13, \S 267.
                                                                       st = 63, \S 49.
data: register control *,
                                    ldunc = 59, \S 49.
                                                                       st\_ready = 14, \S 267.
  ξ124.
                                    lock: lockvar, §167.
                                                                       startup: static void (), §31.
Dcache: cache *, §168.
                                    next: coroutine *, \S 23.
                                                                        state: int, §44.
                                                                        syncid = 65, §49.
demote\_and\_fix :  static
                                    no\_hardware\_PT: bool, §242.
  cacheblock *(), §199.
                                    o: octa, §40.
                                                                        trans\_key = macro(), \S 240.
DT\_hit = 11, \S 267.
                                    p: register cacheblock *,
                                                                       use_and_fix: static
DT_{-}miss = 10, \S 267.
                                      §258.
                                                                          cacheblock *(), §196.
DTcache: cache *, §168.
                                    page\_f: int, §238.
                                                                       wait = macro(), \S 125.
emulate_virt: label, §310.
                                    preld = 61, \S 49.
                                                                       write\_from\_wbuf = 92, \S 129.
fill_lock: lockvar, §167.
                                    prest = 62, \S 49.
                                                                       x: specnode, §44.
filler: coroutine, §167.
                                   ptr_b: void *, §44.
                                                                       y: spec, §44.
filler_ctl: control, §167.
                                   q: register cacheblock *,
                                                                       z: spec, \S44.
fin_ex: label, §144.
```

```
\langle \text{Try to get the contents of location } data \neg z.o \text{ in the D-cache } 274 \rangle \equiv
274.
   \langle Check for prest with a fully spanned cache block 275\rangle;
  if (Dcache→filler.next) goto ld_retry;
  if ((Scache \land Scache \neg lock) \lor (\neg Scache \land mem\_lock)) goto ld\_retry;
  q = alloc\_slot(Dcache, data \neg z.o);
  if (\neg q) goto ld\_retry;
  if (Scache) set_lock(&Dcache→filler, Scache→lock)
  else set_lock(&Dcache→filler, mem_lock);
   set_lock(self, Dcache→fill_lock);
   data \rightarrow ptr_b = Dcache \rightarrow filler\_ctl.ptr_b = (void *) q;
   Dcache \neg filler\_ctl.z.o = data \neg z.o;
   startup(&Dcache→filler, Scache? Scache→access_time: mem_addr_time);
   data \rightarrow state = ld\_ready;
  if (data \rightarrow i \equiv preld \lor data \rightarrow i \equiv prest) goto fin_ex; else sleep;
This code is used in section 273.
275. If a prest instruction makes it to the hot seat, we have been assured by the user
of PREST that the current values of bytes in virtual addresses data \rightarrow u.o - (data \rightarrow xx \& data)
-Dcache -bb) through data -y.o + (data -xx & (Dcache -bb - 1)) are irrelevant. Hence
we can pretend that we know they are zero. This is advantageous if it saves us from
filling a cache block from the S-cache or from memory.
\langle Check for prest with a fully spanned cache block 275\rangle \equiv
  if (data \rightarrow i \equiv prest \land
            (data \rightarrow xx > Dcache \rightarrow bb \lor ((data \rightarrow y.o.l & (Dcache \rightarrow bb - 1)) \equiv 0)) \land
            ((data \rightarrow v.o.l + (data \rightarrow xx & (Dcache \rightarrow bb - 1)) + 1) \oplus data \rightarrow v.o.l) > Dcache \rightarrow bb)
      goto prest_span;
This code is used in section 274.
         \langle Special cases for states in later stages 272\rangle + \equiv
prest\_span: data \rightarrow state = prest\_win;
case prest_win: if (data \neq old\_hot \lor Dlocker.next) wait(1);
  if (Dcache¬lock) goto fin_ex;
  q = alloc\_slot(Dcache, data \neg z.o); /* OK if Dcache \neg filler is busy */
  if (q) {
      clean\_block(Dcache, q);
     q \rightarrow tag = data \rightarrow z.o; \ q \rightarrow tag.l \&= -Dcache \rightarrow bb;
      set\_lock(\&Dlocker, Dcache \rightarrow lock);
      startup (& Dlocker, Dcache→copy_in_time);
  goto fin_{-}ex;
277. (Do load/store stage 2 without D-cache lookup 277) \equiv
   avoid\_D: if (mem\_lock) wait(1);
      set_lock(&mem_locker, mem_lock);
      startup(\&mem\_locker, mem\_addr\_time + mem\_read\_time);
      data \rightarrow x.o = mem\_read(data \rightarrow z.o);
      data \neg state = ld\_ready; wait(mem\_addr\_time + mem\_read\_time);
This code is used in section 273.
```

```
278. \langle Check for a hit in pending writes 278\rangle \equiv {
    octa *m = write\_search(data, data \neg z.o);
    if (m \equiv DUNNO) \ wait(1);
    if (m) {
        data \neg x.o = *m;
        data \neg state = ld\_ready;
        wait(1);
    }
}
```

This code is used in section 273.

```
access_time: int, §167.
alloc_slot: static cacheblock
  *(), §205.
bb: int, §167.
clean_block: void (), §179.
copy_in_time: int, §167.
data: register control *,
  ξ124.
Deache: cache *, §168.
Dlocker: coroutine, §127.
DUNNO = macro, \S 254.
fill_lock: lockvar, §167.
filler: coroutine, §167.
filler_ctl: control, §167.
fin_ex: label, §144.
i: internal_opcode, §44.
l: tetra, §17.
```

```
ld_{ready} = 13, \S 267.
ld_retry: label, §273.
lock: lockvar, §167.
mem\_addr\_time: int, \S 214.
mem_lock: lockvar, §214.
mem_locker: coroutine, §127.
mem_read: octa (), §210.
mem_read_time: int, §214.
next: coroutine *, §23.
o: octa, §40.
octa = struct, §17.
old\_hot: control *, §60.
preld = 61, \S 49.
prest = 62, \S 49.
prest\_win = 15, \S 267.
ptr_b: void *, §44.
q: register cacheblock *,
```

```
§258.
Scache: cache *, §168.
self: register coroutine *,
  §124.
set\_lock = macro(), \S 37.
sleep = macro, \S 125.
startup: static void (), §31.
state: int, §44.
tag: octa, §167.
wait = macro(), \S 125.
write_search: static octa *(),
  §255.
x: specnode, §44.
xx: unsigned char, §44.
y: spec, §44.
z: spec, §44.
```

**279.** The requested octabyte will arrive sooner or later in *data-x.o.* Then a load instruction is almost done, except that we might need to massage the input a little bit.

```
\langle Special cases for states in later stages 272\rangle + \equiv
case ld\_ready: if (self \neg lockloc) *(self \neg lockloc) = \Lambda, self \neg lockloc = \Lambda;
   if (data \rightarrow i > st) goto finish\_store;
   switch (data \rightarrow op \gg 1) {
   case LDB \gg 1: case LDBU \gg 1: i = (data \neg z.o.l \& \#7) \ll 3; i = 56; goto fin\_ld;
   case LDW \gg 1: case LDWU \gg 1: j = (data \neg z.o.l \& \#6) \ll 3; i = 48; goto fin\_ld;
   case LDT \gg 1: case LDTU \gg 1: j = (data \neg z.o.l \& \#4) \ll 3; i = 32;
   fin\_ld: data \rightarrow x.o = shift\_right(shift\_left(data \rightarrow x.o, j), i, data \rightarrow op \& #2);
   default: goto fin_ex;
   case LDHT \gg 1: if (data \neg z.o.l \& 4) data \neg x.o.h = data \neg x.o.l;
      data \rightarrow x.o.l = 0; goto fin_{\underline{\phantom{a}}}ex;
   case LDSF \gg 1: if (data \neg z.o.l \& 4) data \neg x.o.h = data \neg x.o.l;
      if ((data \neg x.o.h \& ^{\#}7f800000) \equiv 0 \land (data \neg x.o.h \& ^{\#}7fffff)) {
         data \rightarrow x.o = load\_sf(data \rightarrow x.o.h);
         data \rightarrow state = 3; wait(denin\_penalty);
      else data \rightarrow x.o = load\_sf(data \rightarrow x.o.h); goto fin\_ex;
   case LDPTP \gg 1: if ((data \neg x.o.h \& sign\_bit) \equiv 0 \lor (data \neg x.o.l \& #1ff8) \neq page\_n)
         data \rightarrow x.o = zero\_octa;
      else data \to x.o.l \& = -(1 \ll 13);
      goto fin_{\bullet}ex:
   case LDPTE \gg 1: if ((data \rightarrow x.o.l \& #1ff8) \neq page_n) data \rightarrow x.o = zero\_octa;
      else data \rightarrow x.o = incr(oandn(data \rightarrow x.o, page\_mask), data \rightarrow x.o.l \& #7);
      data \rightarrow x.o.h \&= #ffff; goto fin_ex;
   case UNSAVE \gg 1: (Handle an internal UNSAVE when it's time to load 336);
280. \langle Special cases for states in later stages 272\rangle + \equiv
finish\_store: data \rightarrow state = st\_ready;
case st_ready: switch (data¬i) {
   case st: case pst: \langle Finish a store command 281\rangle;
   case syncd: data \rightarrow b.o.l = (Dcache ? Dcache \rightarrow bb : 8192); goto <math>do\_syncd;
   case syncid: data \rightarrow b.o.l = (Icache ? Icache \rightarrow bb : 8192);
      if (Dcache \land Dcache \neg bb < data \neg b.o.l) data \neg b.o.l = Dcache \neg bb;
      goto do_syncid;
   }
281.
          Store instructions have an extra complication, because some of them need to
check for overflow.
\langle \text{ Finish a store command } 281 \rangle \equiv
   data \rightarrow x.addr = data \rightarrow z.o;
   if (data \rightarrow b.p) wait(1);
   switch (data \rightarrow op \gg 1) {
   case STUNC \gg 1: data \rightarrow i = stunc:
   default: data \rightarrow x.o = data \rightarrow b.o; goto fin_ex;
   case STSF \gg 1: set\_round; data \rightarrow b.o.h = store\_sf(data \rightarrow b.o);
```

```
data \rightarrow interrupt \mid = exceptions;
   if ((data \rightarrow b.o.h \& #7f800000) \equiv 0 \land (data \rightarrow b.o.h \& #7fffff)) {
      if (data \rightarrow z.o.l \& 4) data \rightarrow x.o.l = data \rightarrow b.o.h;
      else data \rightarrow x.o.h = data \rightarrow b.o.h:
      data \rightarrow state = 3; wait(denout\_penalty);
case STHT \gg 1: if (data \neg z.o.l \& 4) data \neg x.o.l = data \neg b.o.h;
   else data \rightarrow x.o.h = data \rightarrow b.o.h;
   goto fin_{-}ex;
case STB \gg 1: case STBU \gg 1: j = (data \neg z.o.l \& \#7) \ll 3; i = 56; goto fin\_st;
case STW \gg 1: case STWU \gg 1: j = (data \neg z.o.l \& \#6) \ll 3; i = 48; goto fin_st;
case STT \gg 1: case STTU \gg 1: i = (data \neg z.o.l \& \#4) \ll 3: i = 32:
fin_st: ⟨Insert data¬b.o into the proper field of data¬x.o, checking for arithmetic
         exceptions if signed 282 \:
   goto fin_ex:
case CSWAP \gg 1: \langle \text{Finish a CSWAP } 283 \rangle:
case SAVE \gg 1: (Handle an internal SAVE when it's time to store 342);
```

This code is used in section 280.

```
addr: octa, §40.
                                    LDPTE = macro, \S 235.
                                                                           MMIX-ARITH §7.
                                    LDPTP = macro, \S 235.
b: spec, §44.
                                                                         sign\_bit = macro, \S 80.
                                    LDSF = \#90, \S47.
                                                                         st = 63, \S 49.
bb: int, §167.
CSWAP = ^{\#}94, \S47.
                                    LDT = *88, §47.
                                                                         st\_ready = 14, \S 267.
                                    LDTU = {}^{\#}8a, \S 47.
data: register control *,
                                                                         state: int, §44.
                                    LDW = *84, \S47.
                                                                         STB = {}^{\#}a0, \S 47.
  §124.
                                                                         STBU = \#a2, \S47.
                                    LDWU = ^{\#}86, \S47.
Dcache: cache *, §168.
                                                                         STHT = {}^{\#}b2, \S 47.
                                    load_sf: octa (),
denin\_penalty: int, \S 349.
denout_penalty: int, §349.
                                      MMIX-ARITH §39.
                                                                         store_sf: tetra (),
do\_syncd: label, §364.
                                    lockloc: coroutine **, §23.
                                                                          MMIX-ARITH §40.
                                                                         STSF = \#b0, \S47.
do_syncid: label, §364.
                                    o: octa, §40.
                                                                         STT = \#a8, \S47.
exceptions: int,
                                    oandn: octa (),
  MMIX-ARITH §32.
                                      MMIX-ARITH §25.
                                                                         STTU = \#aa, \S47.
fin_ex: label, §144.
                                    op: mmix_opcode, §44.
                                                                         stunc = 67, \S 49.
                                                                         STUNC = ^{\#}b6, §47.
h: tetra, §17.
                                    p: specnode *, §40.
                                                                         STW = {}^{\#}a4, \S 47.
i: register int, \S12.
                                    page_mask: octa, §238.
                                    page_n: int, §238.
                                                                         STWU = {}^{\#}a6, \S 47.
i: internal_opcode, §44.
Icache: cache *, §168.
                                    pst = 66, \S 49.
                                                                         syncd = 64, \S 49.
                                    SAVE = #fa, §47.
                                                                         syncid = 65, §49.
incr: octa (), MMIX-ARITH §6.
                                                                         \mathtt{UNSAVE} = {}^{\#}\mathtt{fb}, \, \S 47.
interrupt: unsigned int, §44.
                                    self: register coroutine *,
j: register int, §12.
                                       §124.
                                                                         wait = macro(), \S 125.
l: tetra, §17.
                                    set\_round = macro, \S 346.
                                                                         x: specnode, §44.
                                    shift_left: octa (),
ld\_ready = 13, \S 267.
                                                                         z: spec, §44.
LDB = #80, \S47.
                                      MMIX-ARITH §7.
                                                                        zero_octa: octa,
LDBU = \#82, \S47.
                                    shift_right: octa (),
                                                                          MMIX-ARITH §4.
LDHT = #92, \S47.
```

```
282. \langle Insert data \neg b.o into the proper field of data \neg x.o, checking for arithmetic exceptions if signed 282\rangle \equiv {
    octa mask;
    if (\neg(data \neg op \& 2)) { octa before, after;
    before = data \neg b.o; after = shift\_right(shift\_left(data \neg b.o, i), i, 0);
    if (before.l \neq after.l \lor before.h \neq after.h) data \neg interrupt \mid = V\_BIT;
    }
    mask = shift\_right(shift\_left(neg\_one, i), j, 1);
    data \neg b.o = shift\_right(shift\_left(data \neg b.o, i), j, 1);
    data \neg x.o.h \oplus = mask.h \& (data \neg x.o.h \oplus data \neg b.o.h);
    data \neg x.o.l \oplus = mask.l \& (data \neg x.o.l \oplus data \neg b.o.l);
}
```

**283.** The CSWAP operation has four inputs (\$X,\$Y,\$Z,rP) as well as three outputs  $(\$X,M_8[A],rP)$ . To keep from exceeding the capacity of the control blocks in our pipeline, we wait until this instruction reaches the hot seat, thereby allowing us non-speculative access to rP.

```
 \langle \, \text{Finish a CSWAP 283} \rangle \equiv \\ \quad \text{if } ( \, data \neq old\_hot ) \, \, wait(1); \\ \quad \text{if } ( \, data \neg x.o.h \equiv g[rP].o.h \wedge data \neg x.o.l \equiv g[rP].o.l ) \, \, \{ \\ \quad data \neg a.o.l = 1; \quad /* \, \, data \neg a.o.h \, \text{is zero } */ \\ \quad data \neg x.o = \, data \neg b.o; \\ \} \, \, \text{else } \, \{ \\ \quad g[rP].o = \, data \neg x.o; \quad /* \, \, data \neg a.o \, \text{is zero } */ \\ \quad \text{if } ( verbose \, \& \, issue\_bit ) \, \, \{ \\ \quad printf \, ("\_setting\_rP="); \, \, print\_octa \, (g[rP].o); \, \, printf \, ("\n"); \\ \quad \} \, \\ \quad \} \, \\ \quad data \neg i = \, cswap; \quad /* \, \, cosmetic \, change, \, affects \, the \, trace \, output \, only \, */ \, \, goto \, \, fin\_ex; \\ \end{aligned}
```

This code is used in section 281.

This code is used in section 281.

**284.** The fetch stage. Now that we've mastered the most difficult memory operations, we can relax and apply our knowledge to the slightly simpler task of filling the fetch buffer. Fetching is like loading/storing, except that we use the I-cache instead of the D-cache. It's slightly simpler because the I-cache is read-only. Further simplifications would be possible if there were no PREGO instruction, because there is only one fetch unit. However, we want to implement PREGO with reasonable efficiency, in order to see if that instruction is worthwhile; so we include the complications of simultaneous I-cache and IT-cache readers, which we have already implemented for the D-cache and DT-cache.

The fetch coroutine is always present, as the one and only coroutine with *stage* number zero.

In normal circumstances, the fetch coroutine accesses a cache block containing the instruction whose virtual address is given by <code>inst\_ptr</code> (the instruction pointer), and transfers up to <code>fetch\_max</code> instructions from that block to the fetch buffer. Complications arise if the instruction isn't in the cache, or if we can't translate the virtual address because of a miss in the IT-cache. Moreover, <code>inst\_ptr</code> is a <code>spec</code> variable whose value might not even be known; if <code>inst\_ptr.p</code> is nonnull, we don't know what to fetch.

```
⟨External variables 4⟩ +≡

Extern spec inst_ptr; /* the instruction pointer (aka program counter) */

Extern octa *fetched; /* buffer for incoming instructions */
```

**285.** The fetch coroutine usually begins a cycle in state  $fetch\_ready$ , with the most recently fetched octabytes in positions  $fetch\_lo$ ,  $fetch\_lo + 1$ , ...,  $fetch\_hi - 1$  of a buffer called fetched. Once that buffer has been exhausted, the coroutine reverts to state 0; with luck, the buffer might have more data by the time the next cycle rolls around.

```
⟨ Global variables 20⟩ +≡
int fetch_lo, fetch_hi; /* the active region of that buffer */
coroutine fetch_co;
control fetch_ctl;
```

```
a: specnode, §44.
                                   i: register int, §12.
b: spec, §44.
                                   i: internal_opcode, §44.
control = struct, §44.
                                   interrupt: unsigned int, §44.
coroutine = struct, \S 23.
                                   issue\_bit = 1 \ll 0, \, \S 8.
cswap = 68, \, 849.
                                   j: register int, §12.
data: register control *,
                                   l: tetra, §17.
                                   neg_one: octa, MMIX-ARITH §4.
  §124.
Extern = macro, \S 4.
                                   o: octa, §40.
fetch\_max: int, \S 59.
                                   octa = struct, \S 17.
fetch\_ready = 23, \S 291.
                                   old\_hot: control *, §60.
fin_ex: label, §144.
                                   op: mmix_opcode, §44.
q: specnode [], §86.
                                   p: specnode *, §40.
h: tetra, §17.
                                   print_octa: static void (), §19.
```

```
286.
         \langle Initialize everything 22\rangle + \equiv
  fetch\_co.ctl = \&fetch\_ctl;
  fetch_co.name = "Fetch";
  fetch\_ctl.ao.o.l = 4:
  startup(\&fetch\_co, 1);
        \langle \text{ Restart the fetch coroutine } 287 \rangle \equiv
  if (fetch\_co.lockloc) *(fetch\_co.lockloc) = \Lambda, fetch\_co.lockloc = \Lambda;
   unschedule(&fetch_co);
  startup(\&fetch\_co, 1);
This code is used in sections 85, 160, 308, 309, and 316.
         Some of the actions here are done not only by the fetcher but also by the first
and second stages of a prego operation.
#define wait\_or\_pass(t)
           if (data \rightarrow i \equiv prego) \{ pass\_after(t); goto passit; \}
           else wait(t)
\langle Simulate an action of the fetch coroutine 288\rangle \equiv
switch0: switch (data→state) {
   new\_fetch: data \neg state = 0:
  case 0: (Wait, if necessary, until the instruction pointer is known 290);
      data \rightarrow y.o = inst\_ptr.o;
      data \rightarrow state = 1; data \rightarrow interrupt = 0; data \rightarrow x.o = data \rightarrow z.o = zero\_octa;
  case 1: start\_fetch: if (data \rightarrow y.o.h \& sign\_bit)
         (Begin fetch with known physical address 296);
     if (page_bad) goto bad_fetch;
     if (ITcache \neg lock \lor (j = qet\_reader(ITcache)) < 0) wait(1);
      startup(\&ITcache \rightarrow reader[j], ITcache \rightarrow access\_time);
      (Look up the address in the IT-cache, and also in the I-cache if possible 291);
      wait\_or\_pass(ITcache \neg access\_time);
      (Other cases for the fetch coroutine 298)
This code is used in section 125.
         \langle Handle special cases for operations like prego and ldvts 289\rangle \equiv
  if (data \rightarrow i \equiv prego) goto start\_fetch;
See also section 352.
This code is used in section 266.
290.
         \langle Wait, if necessary, until the instruction pointer is known 290\rangle \equiv
  if (inst_ptr.p) {
     if (inst\_ptr.p \neq UNKNOWN\_SPEC \land inst\_ptr.p \rightarrow known)
         inst\_ptr.o = inst\_ptr.p \neg o, inst\_ptr.p = \Lambda;
      wait(1);
This code is used in section 288.
```

```
#define qot_IT
                                    /* state when IT-cache entry has been computed */
                            19
#define IT_miss 20
                             /* state when IT-cache doesn't hold the key */
#define IT_hit 21
                            /* state when physical instruction address is known */
#define Ihit_and_miss
                                   /* state when I-cache misses */
#define fetch_ready 23
                                /* state when instructions have been read */
                             /* state when a "preview" octabyte is ready */
#define qot_one 24
\langle Look up the address in the IT-cache, and also in the I-cache if possible 291 \rangle
  p = cache\_search(ITcache, trans\_key(data \rightarrow y.o));
  if (\neg Icache \lor Icache \neg lock \lor (j = qet\_reader(Icache)) < 0)
     (Begin fetch without I-cache lookup 295);
  startup(\&Icache \rightarrow reader[j], Icache \rightarrow access\_time);
  if (p) \langle Do a simultaneous lookup in the I-cache 292 \rangle
  else data \rightarrow state = IT\_miss;
```

access\_time: int, §167. bad\_fetch: label, §301. cache\_search: static cacheblock \*(), §193. ctl: **control** \*, §23. data: register control \*, ξ124. fetch\_co: coroutine, §285. fetch\_ctl: control, §285. get\_reader: static int (), §183. qo: specnode, §44. h: **tetra**, §17. i: internal\_opcode, §44. Icache: cache \*, §168.  $inst\_ptr: \mathbf{spec}, \S 284.$ interrupt: unsigned int, §44.

This code is used in section 288.

ITcache: cache \*, §168.
j: register int, §12.
known: bool, §40.
l: tetra, §17.
ldvts = 60, §49.
lock: lockvar, §167.
lockloc: coroutine \*\*, §23.
name: char \*, §23.
o: octa, §40.
p: specnode \*, §40.
p: register cacheblock \*, §258.
page\_bad: bool, §238.
pass\_after = macro (), §125.
passit: label, §134.

 $\begin{array}{l} prego = 73, \, \S 49. \\ reader: \, \mathbf{coroutine} \, *, \, \S 167. \\ sign_bit = \mathrm{macro}, \, \S 80. \\ startup: \, \mathbf{static} \, \mathbf{void} \, (\,), \, \S 31. \\ state: \, \mathbf{int}, \, \S 44. \\ trans_key = \mathrm{macro} \, (\,), \, \S 240. \\ \mathsf{UNKNOWN\_SPEC} = \mathrm{macro}, \, \S 71. \\ unschedule: \, \mathbf{static} \, \mathbf{void} \, (\,), \\ \S 33. \\ wait = \mathrm{macro} \, (\,), \, \S 125. \\ x: \, \mathbf{specnode}, \, \S 44. \\ y: \, \mathbf{spec}, \, \S 44. \\ z: \, \mathbf{spec}, \, \S 44. \\ zero_octa: \, \mathbf{octa}, \\ \mathrm{MMIX-ARITH} \, \S 4. \\ \end{array}$ 

**292.** We assume that it is possible to look up a virtual address in the IT-cache at the same time as we look for a corresponding physical address in the I-cache, provided that the lower b+c bits of the two addresses are the same. (See the remarks about "page coloring," when we made similar assumptions about the DT-cache and D-cache.)

```
\langle Do a simultaneous lookup in the I-cache 292\rangle \equiv
      (Update IT-cache usage and check the protection bits 293):
      data \rightarrow z.o = phys\_addr(data \rightarrow y.o, p \rightarrow data[0]);
      if (Icache \rightarrow b + Icache \rightarrow c > page\_s \land
                ((data \rightarrow y.o.l \oplus data \rightarrow z.o.l) \& ((Icache \rightarrow bb \ll Icache \rightarrow c) - (1 \ll page\_s))))
                                          /* spurious I-cache lookup */
         data \rightarrow state = IT\_hit;
      else {
         q = cache\_search(Icache, data \neg z.o);
         if (q) {
             q = use\_and\_fix(Icache, q);
             \langle \text{Copy the data from block } q \text{ to } fetched 294 \rangle;
             data \rightarrow state = fetch\_ready;
          } else data \rightarrow state = Ihit\_and\_miss;
      wait\_or\_pass(max(ITcache \neg access\_time, Icache \neg access\_time));
This code is used in section 291.
          \langle \text{Update IT-cache usage and check the protection bits 293} \rangle \equiv
   p = use\_and\_fix(ITcache, p):
   if (\neg(p \rightarrow data[0].l \& (PX_BIT) \gg PROT_OFFSET))) goto bad\_fetch;
This code is used in sections 292 and 295.
294.
          At this point inst\_ptr.o equals data \neg y.o.
\langle \text{Copy the data from block } q \text{ to } fetched 294 \rangle \equiv
   if (data \rightarrow i \neq prego) {
      for (j = 0; j < Icache \rightarrow bb \gg 3; j++) fetched [j] = q \rightarrow data[j];
      fetch\_lo = (inst\_ptr.o.l \& (Icache \neg bb - 1)) \gg 3;
      fetch\_hi = Icache \neg bb \gg 3;
This code is used in sections 292 and 296.
       \langle Begin fetch without I-cache lookup 295\rangle \equiv
   {
      if (p) {
          (Update IT-cache usage and check the protection bits 293);
         data \rightarrow z.o = phys\_addr(data \rightarrow y.o, p \rightarrow data[0]);
         data \rightarrow state = IT\_hit;
      } else data \neg state = IT\_miss;
      wait\_or\_pass(ITcache \neg access\_time);
   }
This code is used in section 291.
```

```
296.
          \langle Begin fetch with known physical address 296\rangle \equiv
      if (data \rightarrow i \equiv prego \land \neg (data \rightarrow loc.h \& sign\_bit)) goto fin\_ex;
      data \rightarrow z.o = data \rightarrow y.o; data \rightarrow z.o.h -= sign\_bit;
   known_phys: if (data¬z.o.h & #ffff0000) goto bad_fetch;
      if (\neg Icache) \langle Read from memory into fetched 297\rangle;
      if (Icache \neg lock \lor (j = qet\_reader(Icache)) < 0) {
          data \rightarrow state = IT\_hit; wait\_or\_pass(1);
      startup(\&Icache \rightarrow reader[j], Icache \rightarrow access\_time);
      q = cache\_search(Icache, data \rightarrow z.o);
      if (q) {
         q = use\_and\_fix(Icache, q);
          \langle \text{Copy the data from block } q \text{ to } fetched 294 \rangle;
          data \rightarrow state = fetch\_ready;
      } else data \rightarrow state = Ihit\_and\_miss;
      wait\_or\_pass(Icache \neg access\_time);
```

This code is used in section 288.

```
access_time: int, §167.
                                    i: internal_opcode, §44.
b: int, §167.
                                    Icache: cache *, §168.
bad_fetch: label, §301.
                                    Ihit\_and\_miss = 22, \S 291.
bb: int, §167.
                                    inst\_ptr: \mathbf{spec}, \S 284.
c: int, §167.
                                    IT_hit = 21, \S 291.
cache_search: static
                                    IT\_miss = 20, \S 291.
  cacheblock *(), §193.
                                    ITcache: cache *, §168.
data: octa *, §167.
                                    j: register int, §12.
data: register control *,
                                    l: tetra, §17.
  §124.
                                    loc: octa, §44.
fetch_hi: int, §285.
                                    lock: lockvar, §167.
fetch_lo: int, §285.
                                    max = macro(), \S 268.
fetch\_ready = 23, \S 291.
                                    o: octa, §40.
fetched: octa *, §284.
                                    p: register cacheblock *,
fin_ex: label, §144.
                                      §258.
get_reader: static int (), §183. page_s: int, §238.
h: tetra, §17.
```

```
phys_addr: static octa (),
  §241.
prego = 73, \S 49.
PROT_OFFSET = 5, §54.
PX_BIT = 1 \ll 5, \S 54.
q: register cacheblock *,
  §258.
reader: coroutine *, §167.
sign\_bit = macro, \S 80.
startup: static void (), §31.
state: int, §44.
use_and_fix: static
  cacheblock *(), §196.
wait\_or\_pass = macro(), \S 288.
y: spec, §44.
z: spec, §44.
```

```
297. \langle Read from memory into fetched 297\rangle \equiv
  \{  octa addr;
     addr = data \rightarrow z.o:
     if (mem\_lock) wait(1):
      set_lock(&mem_locker, mem_lock);
     startup(\&mem\_locker, mem\_addr\_time + mem\_read\_time);
     addr.l \&= -(bus\_words \ll 3);
     fetched[0] = mem\_read(addr);
     for (j = 1; j < bus\_words; j++)
        fetched[j] = mem\_hash[last\_h].chunk[((addr.l \& #ffff) \gg 3) + j];
     fetch\_lo = (data \neg z.o.l \gg 3) \& (bus\_words - 1); fetch\_hi = bus\_words;
     data \rightarrow state = fetch\_ready;
     wait(mem\_addr\_time + mem\_read\_time);
This code is used in section 296.
298. \langle Other cases for the fetch coroutine 298\rangle \equiv
case IT_miss: if (ITcache→filler.next)
     if (data \rightarrow i \equiv prego) goto fin_ex; else wait(1);
  if (no\_hardware\_PT \lor page\_f) (Insert dummy instruction for page table emulation 302);
  p = alloc\_slot(ITcache, trans\_key(data \rightarrow y.o));
                /* hey, it was present after all */
     if (data \rightarrow i \equiv preqo) goto fin_ex; else goto new_efetch;
   data \rightarrow ptr_b = ITcache \rightarrow filler_ctl.ptr_b = (void *) p;
   ITcache \neg filler\_ctl.y.o = data \neg y.o;
   set\_lock(self, ITcache \rightarrow fill\_lock);
   startup(\&ITcache \rightarrow filler, 1);
   data \rightarrow state = qot IT;
  if (data \rightarrow i \equiv prego) goto fin\_ex; else sleep;
case got_IT: release_lock(self, ITcache→fill_lock);
  if (\neg(data \neg z.o.l \& (PX\_BIT \gg PROT\_OFFSET))) goto bad_fetch;
   data \rightarrow z.o = phys\_addr(data \rightarrow y.o, data \rightarrow z.o);
fetch\_retry: data \rightarrow state = IT\_hit;
case IT\_hit: if (data \neg i \equiv prego) goto fin\_ex; else goto known\_phys;
case Ihit_and_miss: ⟨Try to get the contents of location data→z.o in the I-cache 300⟩;
See also section 301.
This code is used in section 288.
         \langle Special cases for states in later stages 272\rangle + \equiv
case IT_miss: case Ihit_and_miss: case IT_hit: case fetch_ready: goto switch0;
```

```
300. ⟨Try to get the contents of location data¬z.o in the I-cache 300⟩ ≡
if (Icache¬filler.next) goto fetch¬retry;
if ((Scache ∧ Scache¬lock) ∨ (¬Scache ∧ mem\_lock)) goto fetch¬retry;
q = alloc¬slot(Icache, data¬z.o);
if (¬q) goto fetch¬retry;
if (Scache) set¬lock(&Icache¬filler, Scache¬lock)
else set¬lock(&Icache¬filler, mem¬lock);
set¬lock(self, Icache¬fill-lock);
data¬ptr¬b = Icache¬filler¬ctl.ptr¬b = (void *) q;
Icache¬filler¬ctl.z.o = data¬z.o;
startup(&Icache¬filler, Scache ? Scache¬access¬time : mem¬addr¬time);
data¬state = got¬one;
if (data¬i ≡ prego) goto fin¬ex; else sleep;
This code is used in section 298.
```

```
access_time: int, §167.
alloc_slot: static cacheblock
  *(), §205.
bad\_fetch: label, §301.
bus_words: int, §214.
chunk: octa *, §206.
data: register control *,
  ξ124.
fetch_hi: int, §285.
fetch_lo: int, §285.
fetch\_ready = 23, \S 291.
fetched: octa *, §284.
fill_lock: lockvar, §167.
filler: coroutine, §167.
filler_ctl: control, §167.
fin_ex: label, §144.
got\_IT = 19, \S 291.
got\_one = 24, \S 291.
i: internal_opcode, §44.
Icache: cache *, §168.
Ihit\_and\_miss = 22, \S 291.
IT\_hit = 21, \S 291.
```

```
IT\_miss = 20, \S 291.
ITcache: cache *, §168.
j: register int, §12.
known\_phys: label, §296.
l: tetra, §17.
last\_h: int, §211.
lock: lockvar, §167.
mem\_addr\_time: int, §214.
mem_hash: chunknode *,
  §207.
mem\_lock: lockvar, §214.
mem\_locker: coroutine, §127.
mem_read: octa (), §210.
mem_read_time: int, §214.
new_fetch: label, §288.
next: coroutine *, §23.
no\_hardware\_PT: bool, §242.
o: octa, §40.
octa = struct, §17.
p: register cacheblock *,
  §258.
page_f: int, §238.
```

phys\_addr: static octa (), §241.  $prego = 73, \S 49.$  $PROT_OFFSET = 5, §54.$  $ptr_b$ : **void** \*, §44.  $PX_BIT = 1 \ll 5, 854.$ q: register cacheblock \*, §258.  $release\_lock = macro(), §37.$ Scache: cache \*, §168. self: register coroutine \*, ξ124.  $set\_lock = macro (), §37.$  $sleep = macro, \S 125.$ startup: static void (), §31. state: **int**, §44. switch0: label, §288.  $trans\_key = macro(), \S 240.$  $wait = macro(), \S 125.$ y: **spec**, §44. z: spec,  $\S44$ .

**301.** The I-cache filler will wake us up with the octabyte we want, before it has filled the entire cache block. In that case we can fetch one or two instructions before the rest of the block has been loaded.

```
\langle Other cases for the fetch coroutine 298\rangle + \equiv
bad\_fetch: if (data \rightarrow i \equiv preqo) goto fin\_ex;
   data \rightarrow interrupt \mid = PX_BIT;
swym\_one: fetched[0].h = fetched[0].l = SWYM \ll 24;
  goto fetch_one;
case qot\_one: fetched[0] = data \neg x.o:
                                                /* a "preview" of the new cache data */
fetch\_one: fetch\_lo = 0; fetch\_hi = 1;
   data \rightarrow state = fetch\_ready;
case fetch_ready: if (self¬lockloc) *(self¬lockloc) = \Lambda, self¬lockloc = \Lambda;
  if (data \rightarrow i \equiv prego) goto fin_ex;
  for (j = 0; j < fetch\_max; j++) {
     register fetch *new_tail;
     if (tail \equiv fetch\_bot) new\_tail = fetch\_top;
     else new\_tail = tail - 1;
     if (new\_tail \equiv head) break;
                                             /* fetch buffer is full */
     \langle Install a new instruction into the tail position 304\rangle;
     tail = new\_tail;
     if (sleepy) {
        sleepy = false; sleep;
     inst\_ptr.o = incr(inst\_ptr.o.4):
     if (fetch\_lo \equiv fetch\_hi) goto new\_fetch;
   }
  wait(1);
302. (Insert dummy instruction for page table emulation 302) \equiv
     if (cache_search(ITcache, trans_key(inst_ptr.o))) goto new_fetch;
     data \rightarrow interrupt \mid = F_BIT;
     sleepy = true;
     goto swym_one;
This code is used in section 298.
         \langle \text{Global variables } 20 \rangle + \equiv
                       /* have we just emitted the page table emulation call? */
  bool sleepy;
```

**304.** At this point we check for egregiously invalid instructions. (Sometimes the dispatcher will actually allow such instructions to occupy the fetch buffer, for internally generated commands.)

```
\langle \text{Install a new instruction into the } tail \text{ position } 304 \rangle \equiv tail \neg loc = inst\_ptr.o; \\ \text{if } (inst\_ptr.o.l \& 4) \ tail \neg inst = fetched [fetch\_lo ++].l; \\ \text{else } tail \neg inst = fetched [fetch\_lo].h; \\ tail \neg interrupt = data \neg interrupt; \\ i = tail \neg inst \gg 24; \\ \text{if } (i \geq \texttt{RESUME} \land i \leq \texttt{SYNC} \land (tail \neg inst \& bad\_inst\_mask[i - \texttt{RESUME}])) \\ tail \neg interrupt \mid = \texttt{B\_BIT}; \\ tail \neg noted = false; \\ \text{if } (inst\_ptr.o.l \equiv breakpoint.l \land inst\_ptr.o.h \equiv breakpoint.h) \ breakpoint\_hit = true; \\ \text{This code is used in section } 301.
```

**305.** The commands RESUME, SAVE, UNSAVE, and SYNC should not have nonzero bits in the positions defined here.

```
⟨Global variables 20⟩ +≡
int bad_inst_mask[4] = {#fffffe, #fffff, #ffff00, #ffffff8};
```

```
B_BIT = 1 \ll 2, \S 54.
                                    fetched: octa *, §284.
                                                                         new_fetch: label, §288.
bool = enum, \S 11.
                                    fin_ex: label, §144.
                                                                         noted: bool, §68.
breakpoint: octa, §10.
                                    got\_one = 24, \S 291.
                                                                         o: octa, §40.
breakpoint_hit: bool, §12.
                                    h: tetra, §17.
                                                                         prego = 73, \S 49.
cache_search: static
                                    head: fetch *, §69.
                                                                         PX_BIT = 1 \ll 5, \S 54.
                                                                         RESUME = \# f9, §47.
  cacheblock *(), §193.
                                    i: internal_opcode, §44.
                                    i: register int, §12.
data: register control *,
                                                                         self: register coroutine *,
  ξ124.
                                    incr: octa (), MMIX-ARITH §6.
                                                                           ξ124.
F_BIT = 1 \ll 17, \S 54.
                                    inst: tetra, §68.
                                                                         sleep = macro, \S 125.
false = 0, \S 11.
                                    inst\_ptr: \mathbf{spec}, \S 284.
                                                                         state: int, §44.
                                    interrupt: unsigned int, §44.
                                                                         SWYM = \#fd, \S47.
fetch = struct, §68.
fetch\_bot: fetch *, §69.
                                    interrupt: unsigned int, §68.
                                                                         SYNC = \#fc, \S47.
fetch_hi: int, §285.
                                    ITcache: cache *, §168.
                                                                         tail: fetch *, §69.
fetch_lo: int, §285.
                                    j: register int, §12.
                                                                         trans\_key = macro(), \S 240.
fetch_max: int, §59.
                                    l: tetra, §17.
                                                                         true = 1, \S 11.
                                    loc: octa, §68.
fetch\_ready = 23, \S 291.
                                                                         wait = macro(), \S 125.
fetch\_top: \mathbf{fetch} *, \S 69.
                                    lockloc: coroutine **, §23.
                                                                         x: specnode, §44.
```

This code is used in section 144.

**306.** Interrupts. The scariest thing about the design of a pipelined machine is the existence of interrupts, which disrupt the smooth flow of a computation in ways that are difficult to anticipate. Fortunately, however, the discipline of a reorder buffer, which forces instructions to be committed in order, allows us to deal with interrupts in a fairly natural way. Our solution to the problems of dynamic scheduling and speculative execution therefore solves the interrupt problem as well.

MMIX has three kinds of interrupts, which show up as bit codes in the *interrupt* field when an instruction is ready to be committed: H\_BIT invokes a trip handler, for TRIP instructions and arithmetic exceptions; F\_BIT invokes a forced-trap handler, for TRAP instructions and unimplemented instructions that need to be emulated in software; E\_BIT invokes a dynamic-trap handler, for external interrupts like I/O signals or for internal interrupts caused by improper instructions. In all three cases, the pipeline control has already been redirected to fetch new instructions starting at the correct handler address by the time an interrupted instruction is ready to be committed.

**307.** Most instructions come to the following part of the program, if they have finished execution with any 1s among the eight trip bits or the eight trap bits.

If the trip bits aren't all zero, we want to update the event bits of rA, or perform an enabled trip handler, or both. If the trap bits are nonzero, we need to hold onto them until we get to the hot seat, when they will be joined with the bits of rQ and probably cause an interrupt. A load or store instruction with nonzero trap bits will be nullified, not committed.

Underflow that is exact and not enabled is ignored, in accordance with the IEEE standard conventions. (This applies also to underflow triggered by RESUME\_SET.)

```
#define is\_load\_store(i) (i \ge ld \land i \le cswap) 
 \langle Handle interrupt at end of execution stage 307\rangle \equiv 
 \{ 
 if ((data \neg interrupt \& \#ff) \land is\_load\_store(data \neg i)) goto state\_5; 
 j = data \neg interrupt \& \#ff00; 
 data \neg interrupt -= j; 
 if ((j \& (U\_BIT + X\_BIT)) \equiv U\_BIT \land \neg (data \neg ra.o.l \& U\_BIT)) j \& = \sim U\_BIT; 
 data \neg arith\_exc = (j \& \sim data \neg ra.o.l) \gg 8; 
 if (j \& data \neg ra.o.l) \land Prepare for exceptional trip handler 308\rangle; 
 if (data \neg interrupt \& \#ff) goto state\_5; 
 \}
```

**308.** Since execution is speculative, an exceptional condition might not be part of the "real" computation. Indeed, the present coroutine might have already been deissued.

```
\langle Prepare for exceptional trip handler 308\rangle \equiv
     i = issued\_between(data, cool);
     if (i < deissues) goto die;
      deissues = i;
      old\_tail = tail = head; resuming = 0;
                                                      /* clear the fetch buffer */
      (Restart the fetch coroutine 287);
      cool\_hist = data \rightarrow hist:
      for (i = j \& data \neg ra.o.l, m = 16; \neg (i \& D_BIT); i \ll 1, m + 16);
      data \neg arith\_exc = (j \& \sim (\#10000 \gg (m \gg 4))) \gg 8;
        /* trips taken are not logged as events */
      data \neg go.o.h = 0, data \neg go.o.l = m;
      inst\_ptr.o = data \neg qo.o, inst\_ptr.p = \Lambda;
      data \rightarrow interrupt \mid = H_BIT;
     goto state_4;
This code is used in section 307.
         \langle Prepare to emulate the page translation 309\rangle \equiv
   i = issued\_between(data, cool);
   if (i < deissues) goto die;
   deissues = i;
   old\_tail = tail = head; resuming = 0;
                                                   /* clear the fetch buffer */
   (Restart the fetch coroutine 287);
   cool\_hist = data \neg hist;
   inst\_ptr.p = UNKNOWN\_SPEC;
   data \rightarrow interrupt \mid = F_BIT;
This code is used in section 310.
```

```
arith_exc: unsigned int, §44.
                                    H_BIT = 1 \ll 16, \S 54.
                                                                        o: octa, §40.
cool: control *, \S60.
                                    head: fetch *, §69.
                                                                         old_tail: fetch *, §70.
cool_hist: unsigned int, §99.
                                    hist: unsigned int, §44.
                                                                        p: specnode *, §40.
cswap = 68, §49.
                                    i: internal_opcode, §44.
                                                                        ra: spec, §44.
D_BIT = 1 \ll 15, \S 54.
                                    i: register int, §12.
                                                                        RESUME_SET = 2, §320.
data: register control *,
                                    inst\_ptr: \mathbf{spec}, \S 284.
                                                                        resuming: int, §78.
  §124.
                                    interrupt: unsigned int, §44.
                                                                        state\_4: label, §310.
deissues: int, §60.
                                    issued_between: static int (),
                                                                        state\_5: label, §310.
die: label, §144.
                                      §159.
                                                                        tail: fetch *, §69.
E_BIT = 1 \ll 18, \S 54.
                                    j: register int, §12.
                                                                        U_BIT = 1 \ll 10, \S 54.
F_BIT = 1 \ll 17, \S 54.
                                    l: tetra, §17.
                                                                        UNKNOWN_SPEC = macro, \S71.
qo: specnode, §44.
                                    ld = 56, \S 49.
                                                                        X_BIT = 1 \ll 8, \S 54.
h: tetra, §17.
                                    m: register int, §12.
```

**310.** We need to stop dispatching when calling a trip handler from within the reorder buffer, lest we issue an instruction that uses g[255] or rB as an operand.

```
\langle Special cases for states in the first stage 266\rangle +\equiv
emulate_virt: \langle Prepare to emulate the page translation 309 \rangle;
state\_4: data \rightarrow state = 4;
case 4: if (dispatch_lock) wait(1);
   set_lock(self, dispatch_lock);
state\_5: data \rightarrow state = 5;
case 5: if (data \neq old\_hot) wait(1);
   if ((data \rightarrow interrupt \& F_BIT) \land data \rightarrow i \neq trap) {
      inst\_ptr.o = g[rT].o, inst\_ptr.p = \Lambda;
      if (is\_load\_store(data \rightarrow i)) nullifying = true;
   if (data→interrupt & #ff) {
      q[rQ].o.h = data \rightarrow interrupt \& #ff;
      new_{\bullet}Q.h \mid = data \neg interrupt \& #ff;
      if (verbose & issue_bit) {
         printf("\_setting\_rQ="); print\_octa(g[rQ].o); printf("\n");
      }
   }
   goto die;
311. The instructions of the previous section appear in the switch for coroutine
stage 1 only. We need to use them also in later stages.
\langle Special cases for states in later stages 272\rangle + \equiv
case 4: goto state_4:
case 5: goto state_5;
         \langle Special cases of instruction dispatch 117\rangle + \equiv
case trap: if ((flags[op] \& X\_is\_dest\_bit) \land cool \neg xx < cool\_G \land cool \neg xx > cool\_L)
      goto increase_L;
   if (\neg q[rT].up \rightarrow known \lor \neg q[rJ].up \rightarrow known) goto stall;
   inst\_ptr = specval(\&g[rT]); /* traps and emulated ops */
   cool \neg need\_b = true, cool \neg b = specval(\&g[255]);
case trip:
   if (\neg g[rJ].up \rightarrow known) goto stall;
   cool \neg ren\_x = true, spec\_install(\&g[255], \&cool \neg x);
   cool \rightarrow x.known = true, cool \rightarrow x.o = q[rJ].up \rightarrow o;
   if (i \equiv trip) cool \neg go.o = zero\_octa;
   cool \neg ren\_a = true, spec\_install(\&q[i \equiv trap ? rBB : rB], \&cool \neg a); break;
         \langle \text{ Cases for stage 1 execution } 155 \rangle + \equiv
case trap: data \rightarrow interrupt = F_BIT; data \rightarrow a.o = data \rightarrow b.o; goto fin_ex;
case trip: data \rightarrow interrupt \mid = H_BIT; data \rightarrow a.o = data \rightarrow b.o; goto fin_ex;
```

The following check is performed at the beginning of every cycle. An instruction in the hot seat can be externally interrupted only if it is ready to be committed and not already marked for tripping or trapping.

```
\langle Check for external interrupt 314\rangle \equiv
   q[rI].o = incr(q[rI].o, -1);
   if (g[rI].o.l \equiv 0 \land g[rI].o.h \equiv 0) {
      g[rQ].o.l = INTERVAL\_TIMEOUT, new\_Q.l = INTERVAL\_TIMEOUT;
      if (verbose & issue_bit) {
        printf("\_setting\_rQ="); print\_octa(g[rQ].o); printf("\n");
      }
   }
   trying\_to\_interrupt = false;
   if (((g[rQ].o.h \& g[rK].o.h) \lor (g[rQ].o.l \& g[rK].o.l)) \land cool \neq hot \land
            \neg(hot \neg interrupt \& (E\_BIT + F\_BIT + H\_BIT)) \land \neg doing\_interrupt \land
            \neg(hot \neg i \equiv resum)) {
      if (hot \neg owner) trying_to_interrupt = true;
      else {
        hot \rightarrow interrupt \mid = E_BIT;
         (Deissue all but the hottest command 316):
        inst\_ptr.o = q[rTT].o; inst\_ptr.p = \Lambda;
This code is used in section 64.
315. \langle Global variables 20 \rangle + \equiv
   bool trying_to_interrupt;
```

```
/* encouraging interruptible operations to pause */
bool nullifying;
                    /* stopping dispatch to nullify a load/store command */
```

```
a: specnode, §44.
                                     inst\_ptr: \mathbf{spec}, \S 284.
                                                                          rJ = 4, \S 52.
                                     interrupt: unsigned int, §44.
                                                                          rK = 15, \S 52.
b: spec, §44.
bool = enum, \S 11.
                                     INTERVAL_TIMEOUT = 1 \ll 6,
                                                                          rQ = 16, \S 52.
cool: control *, §60.
                                       §57.
                                                                          rT = 13, \S 52.
                                                                          rTT = 14, \S 52.
cool\_G: int, §99.
                                     is\_load\_store = macro(), §307.
cool_L: int, §99.
                                     issue\_bit = 1 \ll 0, \S 8.
                                                                          self: register coroutine *,
data: register control *,
                                     known: bool, \S 40.
  §124.
                                     l: tetra, §17.
                                                                          set\_lock = macro(), \S 37.
die: label, §144.
                                     need\_b: bool, §44.
                                                                          spec_install: static void (),
dispatch_lock: lockvar, §65.
                                     new_{\underline{}}Q: \mathbf{octa}, \S 148.
                                                                            §95.
doing_interrupt: int, §65.
                                     o: octa, §40.
                                                                          specval: static spec (), §93.
E_BIT = 1 \ll 18, \S 54.
                                     old\_hot: \mathbf{control} *, \S 60.
                                                                          stall: label, §75.
F_BIT = 1 \ll 17, \S 54.
                                     op: register mmix_opcode,
                                                                          state: int, §44.
false = 0, §11.
                                       §75.
                                                                          trap = 82, §49.
fin_ex: label, §144.
                                     owner: coroutine *, §44.
                                                                          trip = 83, \S 49.
flags: unsigned char [], \S 83.
                                     p: specnode *, §40.
                                                                          true = 1, \S 11.
g: specnode [], §86.
                                     print_octa: static void (), §19.
                                                                          up: specnode *, §40.
go: specnode, §44.
                                     printf: int (), <stdio.h>.
                                                                          verbose: int, §4.
h: tetra, §17.
                                     rB = 0, \S 52.
                                                                          wait = macro(), \S 125.
H_BIT = 1 \ll 16, \S 54.
                                     rBB = 7, \S 52.
                                                                          x: specnode, §44.
hot: \mathbf{control} *, \S 60.
                                     ren_a: bool, §44.
                                                                          X_i = dest_b = \#20, \S83.
i: internal_opcode, §44.
                                     ren_x: bool, §44.
                                                                          xx: unsigned char, §44.
i: register int, §12.
                                     resum = 89, \S 49.
                                                                          zero_octa: octa,
incr: octa (), MMIX-ARITH §6.
                                    rI = 12, \S 52.
                                                                            MMIX-ARITH §4.
increase\_L: label, §110.
```

**316.** It's possible that the command in the hot seat has been deissued, but only if the simulator has done so at the user's request. Otherwise the test ' $i \ge deissues$ ' here will always succeed.

The value of *cool\_hist* becomes flaky here. We could try to keep it strictly up to date, but the unpredictable nature of external interrupts suggests that we are better off leaving it alone. (It's only a heuristic for branch prediction, and a sufficiently strong prediction will survive one-time glitches due to interrupts.)

```
 \langle \text{ Deissue all but the hottest command } 316 \rangle \equiv i = issued\_between(hot,cool); \\ \text{if } (i \geq deissues) \ \{ \\ deissues = i; \\ tail = head; \ resuming = 0; \\ \langle \text{ Restart the fetch coroutine } 287 \rangle; \\ \text{if } (is\_load\_store(hot \neg i)) \ nullifying = true; \\ \}
```

This code is used in section 314.

**317.** Even though an interrupted instruction has officially been either "committed" or "nullified," it stays in the hot seat for two or three extra cycles, while we save enough of the machine state to resume the computation later.

```
 \langle \text{ Begin an interruption and } \mathbf{break} \text{ } 317 \rangle \equiv \\ \{ \\ \text{ } \mathbf{if} \text{ } (\neg(hot\neg interrupt \& \text{H\_BIT})) \text{ } g[rK].o = zero\_octa; \text{ } /* \text{ } \text{trap } */ \\ \text{ } \mathbf{if} \text{ } (((hot\neg interrupt \& \text{H\_BIT}) \land hot\neg i \neq trip) \lor \\ \text{ } ((hot\neg interrupt \& \text{F\_BIT}) \land hot\neg i \neq trap) \lor \\ \text{ } (hot\neg interrupt \& \text{E\_BIT})) \text{ } doing\_interrupt = 3, suppress\_dispatch = true; \\ \text{ } \mathbf{else} \text{ } doing\_interrupt = 2; \text{ } /* \text{ } \text{trip or trap started by dispatcher } */ \\ \text{ } \mathbf{break}; \\ \}
```

This code is used in section 146.

**318.** If a memory failure occurs, we should set rF here, either in case 2 or case 1. The simulator doesn't do anything with rF at present.

```
\langle \operatorname{Perform} \ \operatorname{one} \ \operatorname{cycle} \ \operatorname{of} \ \operatorname{the} \ \operatorname{interrupt} \ \operatorname{preparations} \ 318 \rangle \equiv \\ \mathbf{switch} \ (\operatorname{doing\_interrupt} --) \ \{ \\ \mathbf{case} \ 3: \ \langle \operatorname{Set} \ \operatorname{resumption} \ \operatorname{registers} \ (\operatorname{rB}, \$255) \ \operatorname{or} \ (\operatorname{rBB}, \$255) \ \operatorname{319} \rangle; \ \mathbf{break}; \\ \mathbf{case} \ 2: \ \langle \operatorname{Set} \ \operatorname{resumption} \ \operatorname{registers} \ (\operatorname{rW}, \operatorname{rX}) \ \operatorname{or} \ (\operatorname{rWW}, \operatorname{rXX}) \ \operatorname{320} \rangle; \ \mathbf{break}; \\ \mathbf{case} \ 1: \ \langle \operatorname{Set} \ \operatorname{resumption} \ \operatorname{registers} \ (\operatorname{rY}, \operatorname{rZ}) \ \operatorname{or} \ (\operatorname{rYY}, \operatorname{rZZ}) \ \operatorname{321} \rangle; \\ \mathbf{if} \ (\operatorname{hot} \equiv \operatorname{reorder\_bot}) \ \operatorname{hot} = \operatorname{reorder\_top}; \ \mathbf{else} \ \operatorname{hot} --; \\ \mathbf{break}; \\ \}
```

This code is used in section 64.

```
319. \langle Set resumption registers (rB, $255) or (rBB, $255) 319\rangle \equiv j = hot \neg interrupt \& H_BIT; g[j ? rB : rBB].o = g[255].o; g[255].o = g[rJ].o; if (verbose \& issue\_bit) {
    if (j) {
        printf("\undersetting\underbrank]rB=\underbrank]; print\underbrank octa(g[rB].o);
    } else {
        printf(\underbrank]\underbrank} print\underbrank octa(g[rBB].o);
    } printf(\underbrank]\underbrank} print\underbrank octa(g[255].o); printf(\underbrank]\underbrank} printf(\underbrank]\underbrank} printf(\underbrank]\underbrank]\underbrank} printf(\underbrank]\underbrank number octa(g[255].o); printf(\underbrank]\underbrank]\underbrank} printf(\underbrank]\underbrank number octa(g[255].o); printf(\underbrank]\underbrank]\underbrank} printf(\underbrank]\underbrank number octa(g[255].o); printf(\underbrank]\underbrank]\underbrank}
```

cool: control \*, §60. cool\_hist: unsigned int, §99. deissues: int, §60. doing\_interrupt: int, §65. E\_BIT =  $1 \ll 18$ , §54. F\_BIT =  $1 \ll 17$ , §54. g: specnode [], §86. H\_BIT =  $1 \ll 16$ , §54. head: fetch \*, §69. hot: control \*, §60. i: internal\_opcode, §44. i: register int, §12. interrupt: unsigned int, §44.

 $\begin{array}{l} reorder\_top: \ \mathbf{control} \ *, \S 60. \\ resuming: \ \mathbf{int}, \ \S 78. \\ rJ = 4, \ \S 52. \\ rK = 15, \ \S 52. \\ suppress\_dispatch: \ \mathbf{bool}, \ \S 65. \\ tail: \ \mathbf{fetch} \ *, \S 69. \\ trap = 82, \ \S 49. \\ trip = 83, \ \S 49. \\ true = 1, \ \S 11. \\ verbose: \ \mathbf{int}, \ \S 4. \\ zero\_octa: \ \mathbf{octa}, \\ \mathrm{MMIX-ARITH} \ \S 4. \\ \end{array}$ 

**320.** Here's where we manufacture the "ropcodes" for resumption.

```
#define RESUME_AGAIN 0
                                  /* repeat the command in rX as if in location rW -4 */
#define RESUME CONT 1
                                  /* same, but substitute rY and rZ for operands */
#define RESUME SET 2
                                /* set register $X to rZ */
#define RESUME_TRANS 3
             /* install (rY, rZ) into IT-cache or DT-cache, then RESUME_AGAIN */
#define pack\_bytes(a, b, c, d) ((((((unsigned)(a) \ll 8) + (b)) \ll 8) + (c)) \ll 8) + (d)
\langle Set resumption registers (rW, rX) or (rWW, rXX) 320 \rangle \equiv
  j = pack\_bytes(hot \neg op, hot \neg xx, hot \neg yy, hot \neg zz);
  if (hot→interrupt & H_BIT) {
                                      /* trip */
     q[rW].o = incr(hot \neg loc, 4);
     g[rX].o.h = sign\_bit, g[rX].o.l = j;
     if (verbose & issue_bit) {
        printf("_isetting_irW="); print_octa(q[rW].o);
       printf(", \exists rX="); print\_octa(g[rX].o); printf("\n");
  } else { /* trap */
     g[rWW].o = hot \neg go.o;
     g[rXX].o.l = j;
     if (hot→interrupt & F_BIT) { /* forced */
                                                   /* emulate page translation */
        if (hot \rightarrow i \neq trap) j = RESUME\_TRANS;
       else if (hot \neg op \equiv TRAP) j = *80;
                                                /* TRAP */
       else if (flags[internal\_op[hot \rightarrow op]] \& X\_is\_dest\_bit) j = RESUME\_SET;
             /* emulation */
                            /* emulation when r[X] is not a destination */
        else i = {}^{\#}80:
     } else {
                /* dvnamic */
       if (hot→interim)
          j = (hot \neg i \equiv frem \lor hot \neg i \equiv syncd \lor hot \neg i \equiv syncid ? RESUME\_CONT : RESUME\_AGAIN);
       else if (is\_load\_store(hot \rightarrow i)) j = RESUME\_AGAIN;
       else j = {}^{\#}80;
                          /* normal external interruption */
     q[rXX].o.h = (j \ll 24) + (hot \rightarrow interrupt \& #ff);
     if (verbose & issue_bit) {
       printf("\_setting\_rWW="); print\_octa(g[rWW].o);
       printf(", \exists rXX="); print\_octa(g[rXX].o); printf("\n");
  }
```

This code is used in section 318.

```
321. (Set resumption registers (rY,rZ) or (rYY,rZZ) 321) \equiv j = hot \neg interrupt \& H_BIT; if ((hot \neg interrupt \& F_BIT) \land hot \neg op \equiv SWYM) \ g[rYY].o = hot \neg go.o; else g[j ? rY : rYY].o = hot \neg y.o; if (hot \neg i \equiv st \lor hot \neg i \equiv pst) \ g[j ? rZ : rZZ].o = hot \neg x.o; else g[j ? rZ : rZZ].o = hot \neg z.o; if (verbose \& issue\_bit) \ \{ if (j) \ \{ printf("\_setting\_rY="); print\_octa(g[rY].o); printf(", \_rZ="); print\_octa(g[rZ].o); printf("\n"); \} else \{ printf("\_setting\_rYY="); print\_octa(g[rYY].o); printf(", \_rZZ="); print\_octa(g[rZZ].o); printf("\n"); \}
```

This code is used in section 318.

```
F_BIT = 1 \ll 17, \S 54.
flags: unsigned char [], §83.
frem = 25, \S 49.
g: specnode [], §86.
go: specnode, §44.
h: tetra, §17.
H_BIT = 1 \ll 16, \S 54.
hot: \mathbf{control} *, \S 60.
i: internal_opcode, §44.
incr: octa (), MMIX-ARITH §6.
interim: bool, §44.
internal_op: internal_opcode
  [], §51.
interrupt: unsigned int, §44.
is\_load\_store = macro(), §307.
issue\_bit = 1 \ll 0, \S 8.
```

```
j: register int, §12.
l: tetra, §17.
loc: octa, §44.
o: octa, §40.
op: mmix_opcode, §44.
print_octa: static void (), §19.
printf: int (), <stdio.h>.
pst = 66, \S 49.
rW = 24, \S 52.
rWW = 28, \S 52.
rX = 25, \S 52.
rXX = 29, \S 52.
rY = 26, \S 52.
rYY = 30, \S 52.
rZ = 27, \S 52.
rZZ = 31, \S 52.
```

```
sign\_bit = macro, \S80. st = 63, \S49. SWYM = #fd, §47. syncd = 64, \S49. Syncid = 65, §49. TRAP = #00, §47. trap = 82, \S49. verbose: int, \S4. x: specnode, \S44. X\_is\_dest\_bit = \#20, \S83. xx: unsigned char, \S44. yy: unsigned char, \S44. z: spec, \S44. z: spec, \S44. z: unsigned char, \S44. z: spec, \S44. z: unsigned char, \S44.
```

**322.** Whew; we've successfully interrupted the computation. The remaining task is to restart it again, as transparently as possible.

The RESUME instruction waits for the pipeline to drain, because it has to do such drastic things. For example, an interrupt may be occurring at this very moment, changing the registers needed for resumption.

```
\langle Special cases of instruction dispatch 117\rangle + \equiv
case resume: if (cool \neq old\_hot) goto stall;
   inst\_ptr = specval(\&g[cool \neg zz ? rWW : rW]);
   if (\neg(cool \neg loc.h \& sign\_bit)) {
      if (cool \neg zz) cool \neg interrupt |= K_BIT;
      else if (inst\_ptr.o.h \& sign\_bit) cool \neg interrupt |= P\_BIT;
   if (cool→interrupt) {
      inst\_ptr.o = incr(cool \neg loc, 4); cool \neg i = noop;
   } else {
      cool \rightarrow go.o = inst\_ptr.o;
      if (cool \rightarrow zz) {
         \langle \text{Magically do an I/O operation, if } cool \neg loc \text{ is rT } 372 \rangle;
         cool \neg ren\_a = true, spec\_install(\&q[rK], \&cool \neg a);
         cool \neg a.known = true, cool \neg a.o = q[255].o;
         cool \rightarrow ren\_x = true, spec\_install(\&g[255], \&cool \rightarrow x);
         cool \rightarrow x.known = true, cool \rightarrow x.o = g[rBB].o;
      }
      cool \rightarrow b = specval(\&g[cool \rightarrow zz ? rXX : rX]);
      if (\neg(cool \neg b.o.h \& sign\_bit)) (Resume an interrupted operation 323);
   } break;
```

**323.** Here we set  $cool \rightarrow i = resum$ , since we want to issue another instruction after the RESUME itself.

The restrictions on inserted instructions are designed to ensure that those instructions will be the very next ones issued. (If, for example, an *incgamma* instruction were necessary, it might cause a page fault and we'd lose the operand values for RESUME\_SET or RESUME\_CONT.)

A subtle point arises here: If RESUME\_TRANS is being used to compute the page translation of virtual address zero, we don't want to execute the dummy SWYM instruction from virtual address -4! So we avoid the SWYM altogether.

```
 \langle \text{Resume an interrupted operation } 323 \rangle \equiv \{ \\ cool \neg xx = cool \neg b.o.h \gg 24, cool \neg i = resum; \\ head \neg loc = incr(inst\_ptr.o, -4); \\ \textbf{switch } (cool \neg xx) \} \{ \\ \textbf{case RESUME\_SET: } cool \neg b.o.l = (SETH \ll 24) + (cool \neg b.o.l \& \text{\#ff0000}); \\ head \neg interrupt \mid = cool \neg b.o.h \& \text{\#ff00}; \\ resuming = 2; \\ \textbf{case RESUME\_CONT: } resuming += 1 + cool \neg zz; \\ \textbf{if } ((cool \neg b.o.l \gg 24) \& \text{\#fa}) \neq \text{\#b8}) \} \{ \\ m = cool \neg b.o.l \gg 28; \\ \textbf{if } (1 \ll m) \& \text{\#ff30}) \text{ goto } bad\_resume; \}
```

```
m = (cool \rightarrow b.o.l \gg 16) \& \#ff;
      if (m \ge cool\_L \land m < cool\_G) goto bad_resume;
case RESUME_AGAIN: resume\_again: head \rightarrow inst = cool \rightarrow b.o.l;
   m = head \rightarrow inst \gg 24;
   if (m \equiv RESUME) goto bad\_resume;
                                                     /* avoid uninterruptible loop */
   if (\neg cool \neg zz \land m > \texttt{RESUME} \land m < \texttt{SYNC} \land (head \neg inst \& bad\_inst\_mask[m - \texttt{RESUME}]))
      head \rightarrow interrupt \mid = B_BIT;
   head \neg noted = false; break;
case RESUME_TRANS: if (cool \neg zz) {
      cool \rightarrow y = specval(\&q[rYY]), cool \rightarrow z = specval(\&q[rZZ]);
      if ((cool \rightarrow b.o.l \gg 24) \neq SWYM) goto resume_again;
      cool \neg i = resume; break;
                                            /* see "subtle point" above */
default: bad\_resume: cool \rightarrow interrupt \mid = B_BIT, cool \rightarrow i = noop;
   resuming = 0; break:
}
```

This code is used in section 322.

```
a: specnode, §44.
                                    loc: octa, §44.
                                                                        rWW = 28, \S 52.
                                                                        rX = 25, \S 52.
b: spec, §44.
                                    loc: octa, §68.
B_BIT = 1 \ll 2, \S 54.
                                                                        rXX = 29, \S 52.
                                    m: register int, §12.
                                                                        rYY = 30, \S 52.
bad_inst_mask: int [], §305.
                                    noop = 81, \S 49.
cool: control *, §60.
                                    noted: bool, §68.
                                                                        rZZ = 31, \S 52.
cool\_G: int, §99.
                                    o: octa, §40.
                                                                        SETH = \#e0, \S47.
                                                                        sign\_bit = macro, \S 80.
cool_L: int, §99.
                                    old\_hot: control *, §60.
false = 0, \S 11.
                                    P_BIT = 1 \ll 0, \S 54.
                                                                        spec_install: static void (),
g: specnode [], §86.
                                    rBB = 7, \S 52.
                                                                          §95.
go: specnode, §44.
                                    ren_a: bool, §44.
                                                                        specval: static spec (), §93.
h: tetra, §17.
                                    ren_x: bool, §44.
                                                                        stall: label, §75.
                                    resum = 89, \S 49.
                                                                        SWYM = \#fd, \S47.
head: fetch *, §69.
                                    resume = 76, \S 49.
                                                                        SYNC = {}^{\#}fc, \S 47.
i: internal_opcode, §44.
incgamma = 84, \S 49.
                                    RESUME = ^{\#}f9, §47.
                                                                        syncd = 64, \S 49.
incr: octa (), MMIX-ARITH §6.
                                    RESUME_AGAIN = 0, §320.
                                                                        syncid = 65, \S 49.
inst: tetra, §68.
                                    RESUME_CONT = 1, \S 320.
                                                                        true = 1, \S 11.
inst\_ptr: \mathbf{spec}, \S 284.
                                    RESUME_SET = 2, §320.
                                                                        x: specnode, §44.
interrupt: unsigned int, §44.
                                    RESUME_TRANS = 3, \S 320.
                                                                        xx: unsigned char, §44.
interrupt: unsigned int, §68.
                                    resuming: int, \S78.
                                                                        y: spec, §44.
K_BIT = 1 \ll 3, \S 54.
                                    rK = 15, \S 52.
                                                                        z: spec, §44.
known: bool, §40.
                                    rW = 24, \S 52.
                                                                        zz: unsigned char, §44.
l: tetra, §17.
```

```
324.
          \langle Insert special operands when resuming an interrupted operation 324\rangle \equiv
      if (resuming & 1) {
         cool \neg y = specval(\&q[rY]);
         cool \neg z = specval(\&g[rZ]);
      } else {
         cool \rightarrow y = specval(\&q[rYY]);
         cool \neg z = specval(\&g[rZZ]);
      if (resuming \ge 3) { /* RESUME_SET */
         cool \neg need\_ra = true, cool \neg ra = specval(\&q[rA]);
      cool \neg usage = false;
This code is used in section 103.
          #define do_resume_trans 17
325.
                /* state for performing RESUME_TRANS actions */
\langle Cases for stage 1 execution 155\rangle + \equiv
case resume: case resum: if (data \rightarrow xx \neq RESUME\_TRANS) goto fin\_ex;
   data \neg ptr\_a = (\mathbf{void} *)((data \neg b.o.l \gg 24) \equiv \mathsf{SWYM} ? IT cache : DT cache);
   data \rightarrow state = do\_resume\_trans;
   data \rightarrow z.o = incr(oandn(data \rightarrow z.o, page\_mask), data \rightarrow z.o.l \& 7);
   data \rightarrow z.o.h \&= \#ffff;
   goto resume_trans;
326. \langle Special cases for states in the first stage 266\rangle + \equiv
case do_resume_trans: resume_trans:
   { register cache *c = (cache *) data \rightarrow ptr_a;
      if (c \rightarrow lock) wait(1);
      if (c \rightarrow filler.next) wait(1);
      p = alloc\_slot(c, trans\_key(data \rightarrow y.o));
      if (p) {
         c \rightarrow filler\_ctl.ptr\_b = (\mathbf{void} *) p;
         c \rightarrow filler\_ctl.y.o = data \rightarrow y.o;
         c \rightarrow filler\_ctl.b.o = data \rightarrow z.o;
         c \rightarrow filler\_ctl.state = 1;
         schedule(\&c \rightarrow filler, c \rightarrow access\_time, 1);
      }
      goto fin_ex;
```

**327.** Administrative operations. The internal instructions that handle the register stack simply reduce to things we already know how to do. (Well, the internal instructions for saving and unsaving do sometimes lead to special cases, based on  $data \neg op$ ; for the most part, though, the necessary mechanisms are already present.)

```
⟨ Cases for stage 1 execution 155⟩ +≡
case noop: if (data¬interrupt & F_BIT) goto emulate_virt;
case incrl: case unsave: goto fin_ex;
case jmp: case pushj: data¬go.o = data¬z.o;
   goto fin_ex;
case sav: if (¬(data¬mem_x)) goto fin_ex;
case incgamma: case save: data¬i = st;
   goto switch1;
case decgamma: case unsav: data¬i = ld;
   goto switch1;
```

**328.** We can GET special registers ≥ 21 (that is, rA, rF, rP, rW-rZ, or rWW-rZZ) only in the hot seat, because those registers are implicit outputs of many instructions. The same applies to rK, since it is changed by TRAP and by emulated instructions. Likewise, rQ must not be prematurely gotten.

```
 \begin{split} \langle \operatorname{Cases \ for \ stage} \ 1 \ \operatorname{execution} \ 155 \rangle + &\equiv \\ \operatorname{\mathbf{case}} \ get \colon \operatorname{\mathbf{if}} \ (data \neg zz \geq 21 \lor data \neg zz \equiv rK \lor data \neg zz \equiv rQ) \ \{ \\ \operatorname{\mathbf{if}} \ (data \neq old \ hot) \ wait(1); \\ data \neg z.o = g[data \neg zz].o; \\ \} \\ data \neg x.o = data \neg z.o; \ \operatorname{\mathbf{goto}} \ fin \ ex; \end{split}
```

 $jmp = 80, \S 49.$ 

```
access_time: int, §167.
alloc\_slot: static cacheblock
  *(), §205.
b: spec, §44.
cache = struct, \S 167.
cool: control *, §60.
data: register control *,
  §124.
decgamma = 85, \S 49.
DTcache: cache *, §168.
emulate_virt: label, §310.
F_BIT = 1 \ll 17, \S 54.
false = 0, \S 11.
filler: coroutine, §167.
filler_ctl: control, \S 167.
fin_ex: label, §144.
g: specnode [], §86.
get = 54, \S 49.
go: specnode, §44.
h: tetra, §17.
i: internal_opcode, §44.
incgamma = 84, \S 49.
incr: octa (), MMIX-ARITH §6.
incrl = 86, \S 49.
interrupt: unsigned int, §44.
ITcache: cache *, §168.
```

```
l: tetra, §17.
ld = 56, \S 49.
lock: lockvar, §167.
mem\_x: bool, §44.
need_ra: bool, §44.
next: coroutine *, §23.
noop = 81, \S 49.
o: octa, §40.
oandn: octa ().
  MMIX-ARITH §25.
old\_hot: control *, §60.
op: mmix_opcode, §44.
p: register cacheblock *,
  §258.
page_mask: octa, §238.
ptr_a: void *, §44.
ptr_b: void *, §44.
pushj = 71, \S 49.
rA = 21, \S 52.
ra: spec, §44.
resum = 89, \S 49.
resume = 76, \S 49.
RESUME_SET = 2, §320.
RESUME_TRANS = 3, §320.
resuming: int, §78.
```

```
rK = 15, \S 52.
rQ = 16, \S 52.
rY = 26, \S 52.
rYY = 30, \S 52.
rZ = 27, \S 52.
rZZ = 31, \S 52.
sav = 87, \S 49.
save = 77, \S 49.
schedule: static void (), §28.
specval: static spec (), §93.
st = 63, \S 49.
state: int, §44.
switch1: label, §130.
SWYM = \#fd, \S47.
trans_k key = macro(), \S 240.
true = 1, §11.
unsav = 88, \S 49.
unsave = 78, \S 49.
usage: bool, §44.
wait = macro(), \S 125.
x: specnode, \S 44.
xx: unsigned char, §44.
y: spec, §44.
z: spec, §44.
zz: unsigned char, §44.
```

**329.** A PUT is, similarly, delayed in the cases that hold *dispatch\_lock*. This program does not restrict the 1 bits that might be PUT into rQ, although the contents of that register can have drastic implications.

```
 \begin{array}{l} \langle \operatorname{Cases} \ \operatorname{for} \ \operatorname{stage} \ 1 \ \operatorname{execution} \ 155 \rangle + \equiv \\ \operatorname{case} \ \operatorname{put} \colon \ \operatorname{if} \ (\operatorname{data} \neg xx \equiv 8 \lor (\operatorname{data} \neg xx \geq 15 \land \operatorname{data} \neg xx \leq 20)) \ \{ \\ \operatorname{if} \ (\operatorname{data} \neq \operatorname{old\_hot}) \ \operatorname{wait}(1); \\ \operatorname{switch} \ (\operatorname{data} \neg xx) \ \{ \\ \operatorname{case} \ rV \colon \langle \operatorname{Update} \ \operatorname{the} \ \operatorname{page} \ \operatorname{variables} \ 239 \rangle; \ \operatorname{break}; \\ \operatorname{case} \ rQ \colon \operatorname{new\_Q.h} \ | = \operatorname{data} \neg z.o.h \ \& \sim g[rQ].o.h; \ \operatorname{new\_Q.l} \ | = \operatorname{data} \neg z.o.l \ \& \sim g[rQ].o.l; \\ \operatorname{data} \neg z.o.l \ | = \operatorname{new\_Q.l}; \ \operatorname{data} \neg z.o.h \ | = \operatorname{new\_Q.h}; \ \operatorname{break}; \\ \operatorname{case} \ rL \colon \operatorname{if} \ (\operatorname{data} \neg z.o.h \neq 0) \ \operatorname{data} \neg z.o.h = 0, \operatorname{data} \neg z.o.l = g[rL].o.l; \\ \operatorname{else} \ \operatorname{if} \ (\operatorname{data} \neg z.o.l > g[rL].o.l) \ \operatorname{data} \neg z.o.l = g[rL].o.l; \\ \operatorname{default} \colon \operatorname{break}; \\ \operatorname{case} \ rG \colon \langle \operatorname{Update} \ \operatorname{rG} \ 330 \rangle; \ \operatorname{break}; \\ \} \\ \} \ \operatorname{else} \ \operatorname{if} \ (\operatorname{data} \neg xx \equiv rA \land (\operatorname{data} \neg z.o.h \neq 0 \lor \operatorname{data} \neg z.o.l \geq \#40000)) \\ \operatorname{data} \neg \operatorname{interrupt} \ | = \operatorname{B\_BIT}, \operatorname{data} \neg z.o.h = 0, \operatorname{data} \neg z.o.l \ \& = \#3ffff; \\ \operatorname{data} \neg x.o = \operatorname{data} \neg z.o; \ \operatorname{goto} \ \operatorname{fin\_ex}; \\ \end{array}
```

**330.** When rG decreases, we assume that up to *commit\_max* marginal registers can be zeroed during each clock cycle. (Remember that we're currently in the hot seat, and holding *dispatch\_lock*.)

```
 \begin{array}{l} \langle \, \text{Update rG } \, 330 \, \rangle \equiv \\ & \text{if } \, (data \neg z.o.h \neq 0 \, \lor \, data \neg z.o.l \geq 256 \, \lor \, data \neg z.o.l < g[rL].o.l \, \lor \, data \neg z.o.l < 32) \\ & \, data \neg interrupt \mid = \, \text{B\_BIT}, \, data \neg z.o = g[rG].o; \\ & \text{else if } \, (data \neg z.o.l < g[rG].o.l) \, \, \{ \\ & \, data \neg interim = true; \quad /* \, \, \text{potentially interruptible } \, */ \, \\ & \text{for } \, (j=0; \, j < commit\_max; \, j++) \, \, \{ \\ & \, g[rG].o.l--; \\ & \, g[g[rG].o.l].o = zero\_octa; \\ & \, \text{if } \, (data \neg z.o.l \equiv g[rG].o.l) \, \, \text{break}; \\ & \, \} \\ & \, \text{if } \, (j \equiv commit\_max) \, \, \{ \\ & \, \text{if } \, (\neg trying\_to\_interrupt) \, \, wait(1); \\ & \, \} \, \, \text{else } \, data \neg interim = false; \\ & \, \} \end{array}
```

This code is used in section 329.

**331.** Computed jumps put the desired destination address into the *qo* field.

```
 \begin{array}{l} \langle \operatorname{Cases} \ \operatorname{for} \ \operatorname{stage} \ 1 \ \operatorname{execution} \ 155 \rangle + \equiv \\ \operatorname{\mathbf{case}} \ go: \ data \neg x.o = data \neg go.o; \ \operatorname{\mathbf{goto}} \ add \_ go; \\ \operatorname{\mathbf{case}} \ pop: \ data \neg x.o = data \neg y.o; \\ data \neg y.o = data \neg b.o; \ / * \ \operatorname{move} \ \operatorname{rJ} \ \operatorname{to} \ y \ \operatorname{field} \ * / \\ \operatorname{\mathbf{case}} \ pushgo: \ add \_ go: \ data \neg go.o = oplus (data \neg y.o, data \neg z.o); \\ \operatorname{\mathbf{if}} \ ((data \neg go.o.h \ \& \ \operatorname{sign\_bit}) \land \neg (data \neg loc.h \ \& \ \operatorname{sign\_bit})) \ data \neg \operatorname{interrupt} \ |= \operatorname{P\_BIT}; \\ data \neg go.known = true; \ \operatorname{\mathbf{goto}} \ \operatorname{fin\_ex}; \\ \end{array}
```

**332.** The instruction UNSAVE z generates a sequence of internal instructions that accomplish the actual unsaving. This sequence is controlled by the instruction currently in the fetch buffer, which changes its X and Y fields until all global registers have been loaded. The first instructions of the sequence are UNSAVE 0,0,z; UNSAVE 1,rZ,z-8; UNSAVE  $1,rY,z-16;\ldots$ ; UNSAVE 1,rB,z-96; UNSAVE 2,255,z-104; UNSAVE 2,254,z-112; etc. If an interrupt occurs before these instructions have all been committed, the execution register will contain enough information to restart the process.

After the global registers have all been loaded, UNSAVE continues by acting rather like POP. An interrupt occurring during this last stage will find rS < rO; a context switch might then take us back to restoring the local registers again. But no information will be lost, even though the register from which we began unsaving has long since been replaced.

```
\langle Special cases of instruction dispatch 117\rangle + \equiv
case unsave: if (cool \rightarrow interrupt \& B_BIT) cool \rightarrow i = noop;
  else {
      cool \neg interim = true;
                         /* this instruction needs to be handled by load/store unit */
      op = LDOU:
      cool \rightarrow i = unsav:
     switch (cool \rightarrow xx) {
     case 0: if (cool \neg z.p) goto stall;
         (Set up the first phase of unsaving 334); break;
     case 1: case 2: \langle Generate an instruction to unsave g[yy] 333\rangle; break;
     case 3: cool \rightarrow i = unsave, cool \rightarrow interim = false, op = UNSAVE;
        goto pop_unsave;
     default: cool \rightarrow interim = false, cool \rightarrow i = noop, cool \rightarrow interrupt |= B_BIT; break;
                 /* this takes us to dispatch_done */
  break:
```

```
rQ = 16, \S 52.
b: spec, §44.
                                     l: tetra, §17.
\texttt{B\_BIT} = 1 \ll 2, \, \S 54.
                                     LDOU = #8e, \S47.
                                                                          rV = 18, \S 52.
commit\_max: int, \S 59.
                                     loc: octa, §44.
                                                                          sign\_bit = macro, \S 80.
cool: control *, §60.
                                     new_{\underline{}}Q: \mathbf{octa}, \S 148.
                                                                          stall: label, §75.
data: register control *,
                                     noop = 81, \S 49.
                                                                          true = 1, \S 11.
  §124.
                                     o: octa, §40.
                                                                          trying_to_interrupt: bool,
dispatch_done: label, §101.
                                     old\_hot: control *, §60.
                                                                            §315.
dispatch_lock: lockvar, §65.
                                     op: mmix_opcode, §44.
                                                                          unsav = 88, \, 849.
                                     oplus: octa (), MMIX-ARITH §5. UNSAVE = ^{\#}fb, §47.
false = 0, \S 11.
fin_ex: label, §144.
                                     p: specnode *, §40.
                                                                          unsave = 78, \, \S 49.
g: specnode [], §86.
                                     P_BIT = 1 \ll 0, \S 54.
                                                                          wait = macro(), \S 125.
                                                                          x: specnode, §44.
go = 72, \S 49.
                                     pop = 75, \S 49.
                                     pop_unsave: label, §120.
qo: specnode, §44.
                                                                          xx: unsigned char, §44.
h: tetra, §17.
                                     pushqo = 74, \S 49.
                                                                          y: spec, §44.
i: internal_opcode, §44.
                                     put = 55, §49.
                                                                          yy: unsigned char, §44.
interim: bool, §44.
                                     rA = 21, \S 52.
                                                                          z: spec, \S44.
interrupt: unsigned int, §44.
                                     rG = 19, \S 52.
                                                                          zero_octa: octa,
j: register int, §12.
                                     rL = 20, \S 52.
                                                                            MMIX-ARITH §4.
known: bool, \S 40.
```

```
333.
          \langle Generate an instruction to unsave q[yy] 333 \rangle \equiv
   cool \neg ren\_x = true, spec\_install(\&g[cool \neg yy], \&cool \neg x);
   new\_O = new\_S = incr(cool\_O, -1);
   cool \neg z.o = shift\_left(new\_O.3):
   cool \neg ptr = a = (void *) mem.up;
This code is used in section 332.
          \langle Set up the first phase of unsaving 334\rangle \equiv
   cool \neg ren\_x = true, spec\_install(\&q[rG], \&cool \neg x);
   cool \neg ren\_a = true.spec\_install(\&q[rA],\&cool \neg a):
   new_{\bullet}O = new_{\bullet}S = shift_right(cool \rightarrow z.o, 3, 1);
   cool \rightarrow set\_l = true, spec\_install(\&q[rL], \&cool \rightarrow rl);
   cool \neg ptr\_a = (\mathbf{void} *) mem.up;
This code is used in section 332.
          \langle Get ready for the next step of UNSAVE 335\rangle
   switch (cool \rightarrow xx) {
   case 0: head \rightarrow inst = pack\_bytes(UNSAVE, 1, rZ, 0); break;
   case 1: if (cool \rightarrow yy \equiv rP) head \rightarrow inst = pack\_bytes(UNSAVE, 1, rR, 0);
      else if (cool \neg yy \equiv 0) head \neg inst = pack\_bytes (UNSAVE, 2, 255, 0);
      else head \rightarrow inst = pack\_bytes (UNSAVE, 1, cool \rightarrow yy - 1, 0); break;
   case 2: if (cool \neg yy \equiv cool G) head \neg inst = pack bytes (UNSAVE, 3, 0, 0);
      else head \neg inst = pack\_bytes(UNSAVE, 2, cool \neg yy - 1, 0); break;
   }
This code is used in section 81.
336. (Handle an internal UNSAVE when it's time to load 336) \equiv
   if (data \rightarrow xx \equiv 0) {
      data \rightarrow a.o = data \rightarrow x.o; data \rightarrow a.o.h \& = #ffffff;
                                                                          /* unsaved rA */
      data \rightarrow x.o.l = data \rightarrow x.o.h \gg 24; data \rightarrow x.o.h = 0;
                                                                           /* unsaved rG */
      if (data \rightarrow a.o.h \lor (data \rightarrow a.o.l \& \#fffc0000)) {
         data \rightarrow a.o.h = 0, data \rightarrow a.o.l \& = *3ffff; <math>data \rightarrow interrupt = B_BIT;
      if (data \rightarrow x.o.l < 32) {
         data \rightarrow x.o.l = 32; data \rightarrow interrupt = B_BIT;
   }
   goto fin_{-}ex:
This code is used in section 279.
          Of course SAVE is handled essentially like UNSAVE, but backwards.
\langle Special cases of instruction dispatch 117\rangle + \equiv
case save: if (cool \rightarrow xx < cool \rightarrow G) cool \rightarrow interrupt = B_BIT;
   if (cool \rightarrow interrupt \& B_BIT) cool \rightarrow i = noop;
   else if (((cool\_S.l - cool\_O.l - cool\_L - 1) \& lring\_mask) \equiv 0)
      (Insert an instruction to advance gamma 113)
   else {
      cool \rightarrow interim = true:
      cool \rightarrow i = sav;
      switch (cool \rightarrow zz) {
```

```
case 0: \langle Set up the first phase of saving 338\rangle; break; case 1: if (cool\_O.l \neq cool\_S.l) \langle Insert an instruction to advance gamma 113\rangle cool\_zz = 2; cool\_yy = cool\_G; case 2: case 3: \langle Generate an instruction to save g[yy] 339\rangle; break; default: cool\_interim = false, cool\_i = noop, cool\_interrupt |= B\_BIT; break; <math>\} \} break;
```

**338.** If an interrupt occurs during the first phase, say between two incgamma instructions, the value  $cool \neg zz = 1$  will get things restarted properly. (Indeed, if context is saved and unsaved during the interrupt, many incgamma instructions may no longer be necessary.)

```
 \begin{array}{l} cool \neg zz = 1; \\ cool \neg ren\_x = true, spec\_install(\&l[(cool\_O.l + cool\_L) \& lring\_mask], \& cool \neg x); \\ cool \neg x.known = true, cool \neg x.o.h = 0, cool \neg x.o.l = cool\_L; \\ cool \neg set\_l = true, spec\_install(\&g[rL], \& cool \neg rl); \\ new\_O = incr(cool\_O, cool\_L + 1); \\ \hline \\ \textbf{This code is used in section 337}. \\ \hline \\ \textbf{339.} \quad \langle \text{ Generate an instruction to save } g[yy] \text{ 339} \rangle \equiv \\ op = \text{STOU}; \qquad /* \text{ this instruction needs to be handled by load/store unit } */cool \neg mem\_x = true, spec\_install(\& mem, \& cool \neg x); \\ cool \neg z.o = shift\_left(cool\_O, 3); \\ new\_O = new\_S = incr(cool\_O, 1); \\ \text{if } (cool \neg zz \equiv 3 \land cool \neg yy > rZ) \ \langle \text{ Do the final SAVE 340} \rangle \\ \text{else } cool \neg b = specval(\& g[cool \neg yy]); \\ \hline \end{array}
```

 $\langle$  Set up the first phase of saving 338 $\rangle \equiv$ 

This code is used in section 337.

```
a: specnode, §44.
                                   known: bool, §40.
                                                                       rR = 6, \S 52.
b: spec, §44.
                                   l: tetra, §17.
                                                                       rZ = 27, \S 52.
B_BIT = 1 \ll 2, \S 54.
                                   l: specnode *, §86.
                                                                       sav = 87, \S 49.
cool: control *, §60.
                                   lring\_mask: int, §88.
                                                                       save = 77, \S 49.
                                                                       set\_l: bool, §44.
cool\_G: int, §99.
                                   mem: specnode, §115.
cool_L: int, §99.
                                   mem_x: bool, §44.
                                                                       shift_left: octa (),
cool_O: octa, §98.
                                   new\_O: octa, §99.
                                                                         MMIX-ARITH §7.
cool_S: octa, §98.
                                   new\_S: octa, §99.
                                                                       shift_right: octa (),
data: register control *,
                                   noop = 81, \S 49.
                                                                         MMIX-ARITH §7.
                                                                       spec_install: static void (),
  ξ124.
                                   o: octa, §40.
false = 0, §11.
                                   op: register mmix_opcode,
                                                                         ξ95.
fin_ex: label, §144.
                                                                       specval: static spec (), §93.
g: specnode [], §86.
                                   pack\_bytes = macro(), \S 320.
                                                                       STOU = {}^{\#}ae, \S 47.
h: tetra, §17.
                                   ptr_a: void *, §44.
                                                                       true = 1, \S 11.
                                   rA = 21, \S 52.
                                                                       \mathtt{UNSAVE} = {}^{\#}\mathtt{fb}, \, \S 47.
head: fetch *, §69.
i: internal_opcode, §44.
                                   ren_a: bool, §44.
                                                                       up: specnode *, §40.
incgamma = 84, \S 49.
                                   ren_x: bool, §44.
                                                                       x: specnode, §44.
incr: octa (), MMIX-ARITH §6.
                                   rG = 19, \S 52.
                                                                       xx: unsigned char, §44.
inst: tetra, §68.
                                   rl: specnode, §44.
                                                                       yy: unsigned char, §44.
interim: bool, §44.
                                   rL = 20, \S 52.
                                                                       z: spec, \S44.
interrupt: unsigned int, §44.
                                   rP = 23, \S 52.
                                                                       zz: unsigned char, §44.
```

**340.** The final SAVE instruction not only stores rG and rA, it also places the final address in global register X.

/ Do the final SAVE 340 =

```
\langle Do \text{ the final SAVE } 340 \rangle \equiv
      cool \rightarrow i = save;
      cool \neg interim = false;
      cool \neg ren\_a = true, spec\_install(\&q[cool \neg xx], \&cool \neg a);
This code is used in section 339.
341. \langle Get ready for the next step of SAVE 341\rangle \equiv
   switch (cool \rightarrow zz) {
   case 1: head \rightarrow inst = pack\_bytes(SAVE, cool \rightarrow xx, 0, 1); break;
   case 2: if (cool \rightarrow yy \equiv 255) head \rightarrow inst = pack\_bytes (SAVE, cool \rightarrow xx, 0, 3);
      else head \neg inst = pack\_bytes(SAVE, cool \neg xx, cool \neg yy + 1, 2); break;
   case 3: if (cool \rightarrow yy \equiv rR) head \rightarrow inst = pack\_bytes(SAVE, cool \rightarrow xx, rP, 3);
      else head \neg inst = pack\_bytes(SAVE, cool \neg xx, cool \neg yy + 1, 3); break;
This code is used in section 81.
342. (Handle an internal SAVE when it's time to store 342) \equiv
      if (data \rightarrow interim) data \rightarrow x.o = data \rightarrow b.o;
      else {
         if (data \neq old\_hot) wait(1);
                                                       /* we need the hottest value of rA */
          data \rightarrow x.o.h = g[rG].o.l \ll 24;
          data \rightarrow x.o.l = g[rA].o.l;
          data \rightarrow a.o = data \rightarrow y.o;
      goto fin_{-}ex;
This code is used in section 281.
```

**343.** More register-to-register ops. Now that we've finished most of the hard stuff, we can relax and fill in the holes that we left in the all-register parts of the execution stages.

First let's complete the fixed point arithmetic operations, by dispensing with multiplication and division.

```
\langle Cases to compute the results of register-to-register operation 137\rangle + \equiv
case mulu: data \rightarrow x.o = omult(data \rightarrow y.o, data \rightarrow z.o);
   data \rightarrow a.o = aux;
   goto quantify_mul;
case mul: data \rightarrow x.o = signed\_omult(data \rightarrow y.o, data \rightarrow z.o);
   if (overflow) data\neginterrupt |= V_BIT;
quantify_mul: aux = data \rightarrow z.o;
   for (j = mul0; aux.l \lor aux.h; j++) aux = shift\_right(aux, 8, 1);
   data \rightarrow i = i; break;
                                   /* j is mul0 or mul1 or ... or mul8 */
case divu: data \rightarrow x.o = odiv(data \rightarrow b.o, data \rightarrow y.o, data \rightarrow z.o);
   data \rightarrow a.o = aux; data \rightarrow i = div; break;
case div: if (data \neg z.o.l \equiv 0 \land data \neg z.o.h \equiv 0) {
       data \rightarrow interrupt \mid = D_BIT; data \rightarrow a.o = data \rightarrow y.o;
                               /* divide by zero needn't wait in the pipeline */
   } else {
       data \rightarrow x.o = signed\_odiv(data \rightarrow y.o, data \rightarrow z.o);
       if (overflow) data\rightarrowinterrupt |= V_BIT;
       data \rightarrow a.o = aux;
   } break;
```

```
SAVE = #fa, §47.
                                   mul = 26, \S 49.
a: specnode, §44.
                                   mul\theta = 0, \S 49.
                                                                      set = 33, \S 49.
aux: octa, MMIX-ARITH §4.
                                                                      shift_right: octa (),
b: spec, §44.
                                   mul1 = 1, \S 49.
cool: control *, §60.
                                   mul8 = 8, \S 49.
                                                                        MMIX-ARITH §7.
D_BIT = 1 \ll 15, \S 54.
                                   mulu = 27, \S 49.
                                                                      signed_odiv: octa (),
data: register control *,
                                   o: octa, §40.
                                                                        MMIX-ARITH §24.
  §124.
                                   odiv: octa (), MMIX-ARITH §13. signed_omult: octa (),
div = 9, \S 49.
                                   old\_hot: control *, §60.
                                                                        MMIX-ARITH §12.
divu = 28, §49.
                                   omult: octa (),
                                                                      spec_install: static void (),
false = 0, §11.
                                     MMIX-ARITH §8.
                                                                        ξ95.
fin_ex: label, §144.
                                   overflow: bool,
                                                                      true = 1, \S 11.
g: specnode [], §86.
                                     MMIX-ARITH §4.
                                                                      V_BIT = 1 \ll 14, \S 54.
h: tetra, §17.
                                   pack\_bytes = macro(), \S 320.
                                                                      wait = macro(), \S 125.
head: \mathbf{fetch} *, \S 69.
                                   rA = 21, \S 52.
                                                                      x: specnode, §44.
i: internal_opcode, §44.
                                   ren_a: bool, §44.
                                                                      xx: unsigned char, §44.
inst: tetra, §68.
                                   rG = 19, \S 52.
                                                                      y: spec, §44.
interim: bool, §44.
                                   rP = 23, \S 52.
                                                                      yy: unsigned char, §44.
interrupt: unsigned int, §44.
                                   rR = 6, \S 52.
                                                                      z: spec, \S44.
j: register int, §12.
                                   save = 77, §49.
                                                                      zz: unsigned char, §44.
l: tetra, §17.
```

**344.** Next let's polish off the bitwise and bytewise operations.

```
⟨ Cases to compute the results of register-to-register operation 137⟩ +≡
case sadd:
  data¬x.o.l = count_bits(data¬y.o.h & ~data¬z.o.h) + count_bits(data¬y.o.l & ~data¬z.o.l);
  break;
case mor: data¬x.o = bool_mult(data¬y.o, data¬z.o, data¬op & #2); break;
case bdif: data¬x.o.h = byte_diff(data¬y.o.h, data¬z.o.h);
  data¬x.o.l = byte_diff(data¬y.o.l, data¬z.o.l); break;
case wdif: data¬x.o.h = wyde_diff(data¬y.o.h, data¬z.o.h);
  data¬x.o.l = wyde_diff(data¬y.o.l, data¬z.o.l); break;
case tdif: if (data¬y.o.h > data¬z.o.h) data¬x.o.h = data¬y.o.h - data¬z.o.h;
tdif_l: if (data¬y.o.l > data¬z.o.l) data¬x.o.l = data¬y.o.l - data¬z.o.l; break;
case odif: if (data¬y.o.h > data¬z.o.h) data¬x.o. = ominus(data¬y.o, data¬z.o.);
  else if (data¬y.o.h ≡ data¬z.o.h) goto tdif_l;
  break;
```

**345.** The conditional set (CS) instructions are, rather surprisingly, more difficult to implement than the zero set (ZS) instructions, although the ZS instructions do more. The reason is that dynamic instruction dependencies are more complicated with CS. Consider, for example, the instructions

```
LDO x,a,b; FDIV y,c,d; CSZ y,x,0; INCL y,1.
```

If the value of x is zero, the INCL instruction need not wait for the division to be completed. (We do not, however, abort the division in such a case; it might invoke a trip handler, or change the inexact bit, etc. Our policy is to treat common cases efficiently and to treat all cases correctly, but not to treat all cases with maximum efficiency.)

```
⟨ Cases to compute the results of register-to-register operation 137⟩ +≡
case zset: if (register_truth(data¬y.o, data¬op)) data¬x.o = data¬z.o;
    /* otherwise data¬x.o is already zero */
goto fin_ex;
case cset: if (register_truth(data¬y.o, data¬op)) data¬x.o = data¬z.o, data¬b.p = Λ;
else if (data¬b.p ≡ Λ) data¬x.o = data¬b.o;
else {
    data¬state = 0; data¬need_b = true; goto switch1;
} break;
```

**346.** Floating point computations are mostly handled by the routines in MMIX-ARITH, which record anomalous events in the global variable *exceptions*. But we consider the operation trivial if an input is infinite or NaN; and we may need to increase the execution time when subnormals are present.

```
#define ROUND_OFF 1  
#define ROUND_UP 2  
#define ROUND_DOWN 3  
#define ROUND_NEAR 4  
#define is\_subnormal(x) ((x.h \& ^\#7ff00000) \equiv 0 \land ((x.h \& ^\#fffff) \lor x.l))  
#define is\_trivial(x) ((x.h \& ^\#7ff00000) \equiv ^\#7ff00000)  
#define set\_round cur\_round = (data \neg ra.o.l < ^\#10000 ? ROUND\_NEAR : <math>data \neg ra.o.l \gg 16)
```

break;

```
\langle Cases to compute the results of register-to-register operation 137\rangle + \equiv
case fadd: set\_round; data \rightarrow x.o = fplus(data \rightarrow y.o, data \rightarrow z.o);
fin\_bflot: if (is\_subnormal(data \neg y.o)) data \neg denin = denin\_penalty;
fin\_uflot: if (is\_subnormal(data \neg x.o)) data \neg denout = denout\_penalty;
fin\_flot: if (is\_subnormal(data \neg z.o)) data \neg denin = denin\_penalty;
   data \rightarrow interrupt \mid = exceptions;
   if (is\_trivial(data \rightarrow y.o) \lor is\_trivial(data \rightarrow z.o)) goto fin\_ex;
   if (data \rightarrow i \equiv fsqrt \land (data \rightarrow z.o.h \& sign\_bit)) goto fin\_ex;
   break:
case fsub: data \rightarrow a.o = data \rightarrow z.o;
   if (fcomp(data \neg z.o, zero\_octa) \neq 2) data \neg a.o.h \oplus = sign\_bit;
   set\_round; data \rightarrow x.o = fplus(data \rightarrow y.o, data \rightarrow a.o);
   data \rightarrow i = fadd;
                            /* use pipeline times for addition */
   goto fin_bflot;
case fmul: set\_round; data \neg x.o = fmult(data \neg y.o, data \neg z.o); goto fin\_bflot;
case fdiv: set\_round; data \neg x.o = fdivide(data \neg y.o, data \neg z.o); goto fin\_bflot;
case fsqrt: set\_round; data \neg x.o = froot(data \neg z.o, data \neg y.o.l); goto fin\_uflot;
case fint: set\_round; data \rightarrow x.o = fintegerize(data \rightarrow z.o, data \rightarrow y.o.l); goto fin_uflot;
case fix: set\_round; data \neg x.o = fixit(data \neg z.o, data \neg y.o.l);
   if (data \rightarrow op \& #2) exceptions \&= \sim W_BIT;
                                                               /* unsigned case doesn't overflow */
   goto fin_flot:
case flot: set\_round; data \neg x.o = floatit(data \neg z.o, data \neg y.o.l, data \neg op \& #2, data \neg op \& #4);
   data \rightarrow interrupt \mid = exceptions; break;
347. \langle Special cases of instruction dispatch 117\rangle + \equiv
case fsqrt: case fint: case fix: case flot: if (cool \neg y.o.l > 4) goto illegal\_inst;
```

```
a: specnode, §44.
                                   fin_ex: label, §144.
                                                                      o: octa, §40.
b: spec, §44.
                                   fint = 18, \S 49.
                                                                      odif = 51, \S 49.
bdif = 48, \S 49.
                                   fintegerize: octa (),
                                                                      ominus: octa (),
bool_mult: octa (),
                                    MMIX-ARITH §86.
                                                                       MMIX-ARITH §5.
  MMIX-ARITH §29.
                                   fix = 19, \S 49.
                                                                      op: mmix_opcode, §44.
byte_diff: tetra (),
                                   fixit: octa (), mmix-arith §88.
                                                                     p: specnode *, §40.
  MMIX-ARITH §27.
                                  floatit: octa (),
                                                                      ra: spec, §44.
cool: control *, §60.
                                    MMIX-ARITH §89.
                                                                      register_truth: static int (),
count_bits: int (),
                                   flot = 20, \S 49.
                                                                        §157.
  MMIX-ARITH §26.
                                   fmul = 15, \S 49.
                                                                      sadd = 12, \S 49.
cset = 53, \S 49.
                                   fmult: octa (),
                                                                      sign\_bit = macro, \S 80.
cur_round: int,
                                     MMIX-ARITH §41.
                                                                      state: int, §44.
  MMIX-ARITH §30.
                                   fplus: octa (),
                                                                      switch1: label, §130.
data: register control *,
                                     MMIX-ARITH §46.
                                                                      tdif = 50, \S 49.
                                                                      true=1, \ \S 11.
  ξ124.
                                   froot: octa (),
denin: int, §44.
                                                                      W_BIT = 1 \ll 13, \S 54.
                                     MMIX-ARITH §91.
                                                                      wdif = 49, \S 49.
denin\_penalty: int, §349.
                                   fsqrt = 17, \S 49.
denout: int, §44.
                                   fsub = 24, \S 49.
                                                                      wyde\_diff: \mathbf{tetra} (),
denout_penalty: int, §349.
                                   h: tetra, §17.
                                                                       MMIX-ARITH §28.
exceptions: int,
                                   i: internal_opcode, §44.
                                                                      x: specnode, §44.
  MMIX-ARITH §32.
                                   illegal_inst: label, §118.
                                                                      y: spec, §44.
                                                                      z: spec, §44.
fadd = 14, \S 49.
                                   interrupt: unsigned int, §44.
fcomp: int (), MMIX-ARITH §85. l: tetra, §17.
                                                                      zero_octa: octa,
                                   mor = 13, \S 49.
fdiv = 16, \S 49.
                                                                       MMIX-ARITH §4.
fdivide: octa (),
                                   need_b: bool, §44.
                                                                      zset = 52, §49.
  MMIX-ARITH §44.
```

```
348. (Cases to compute the results of register-to-register operation 137) +\equiv
case feps: j = fepscomp(data \rightarrow y.o, data \rightarrow z.o, data \rightarrow b.o, data \rightarrow op \neq FEQLE);
   if (i \equiv 2) data\vec{a} = fcmp;
   else if (is\_subnormal(data \rightarrow y.o) \lor is\_subnormal(data \rightarrow z.o)) data \rightarrow denin = denin\_penalty;
   switch (data \rightarrow op) {
   case FUNE: if (i \equiv 2) goto cmp\_pos; else goto cmp\_zero;
   case FEQLE: goto cmp_fin;
   case FCMPE: if (j) goto cmp_zero_or_invalid;
case fcmp: j = fcomp(data \rightarrow y.o, data \rightarrow z.o);
   if (i < 0) goto cmp\_neq;
cmp\_fin: if (j \equiv 1) goto cmp\_pos;
cmp\_zero\_or\_invalid: if (i \equiv 2) data \neg interrupt = I\_BIT;
   goto cmp_zero:
case funeq: if (fcomp(data \neg y.o, data \neg z.o) \equiv (data \neg op \equiv FUN ? 2 : 0)) goto cmp\_pos;
   else goto cmp_zero:
349.
         \langle \text{External variables } 4 \rangle + \equiv
   Extern int frem_max;
   Extern int denin_penalty, denout_penalty;
         The floating point remainder operation is especially interesting because it can
be interrupted when it's in the hot seat.
\langle Cases to compute the results of register-to-register operation 137\rangle + \equiv
case frem: if (is\_trivial(data \rightarrow y.o) \lor is\_trivial(data \rightarrow z.o)) {
      data \rightarrow x.o = fremstep(data \rightarrow y.o, data \rightarrow z.o, 2500);
      data \rightarrow interrupt \mid = exceptions; goto fin_ex;
   if ((self + 1) \rightarrow next) wait (1);
   data \neg interim = true;
   i = 1;
   if (is\_subnormal(data \neg y.o) \lor is\_subnormal(data \neg z.o)) j += denin\_penalty;
   pass\_after(j);
   goto passit;
```

```
351. \langle Begin execution of a stage-two operation 351 \rangle \equiv j = 1;
if (data \neg i \equiv frem) {
   data \neg x.o = fremstep(data \neg y.o, data \neg z.o, frem\_max);
   if (exceptions \& E\_BIT) {
   data \neg y.o = data \neg x.o;
    if (trying\_to\_interrupt \land data \equiv old\_hot) goto fin\_ex;
} else {
   data \neg state = 3;
   data \neg interim = false;
   data \neg interrupt \mid = exceptions;
   if (is\_subnormal(data \neg x.o)) j += denout\_penalty;
}
   wait(j);
}
```

This code is used in section 135.

```
b: spec, §44.
                                      MMIX-ARITH §50.
                                                                       next: coroutine *, §23.
cmp\_neg: label, §143.
                                   FEQLE = \# 13, §47.
                                                                       o: octa, §40.
                                   fin_ex: label, §144.
cmp\_pos: label, §143.
                                                                       old\_hot: control *, §60.
cmp_zero: label, §143.
                                   frem = 25, \S 49.
                                                                       op: mmix_opcode, §44.
data: register control *,
                                   fremstep: octa (),
                                                                       pass\_after = macro(), \S 125.
  ξ124.
                                     MMIX-ARITH §93.
                                                                       passit: label, §134.
denin: int, §44.
                                   FUN = {}^{\#}02, \S 47.
                                                                       self: register coroutine *,
{\tt E\_BIT} = 1 \ll 18, \, \S 54.
                                   FUNE = ^{\#}12, \S47.
                                                                         §124.
exceptions: int,
                                   funeq = 23, \S 49.
                                                                       state: int, §44.
                                   i: internal_opcode, §44.
  MMIX-ARITH §32.
                                                                       true = 1, \S 11.
                                   I_BIT = 1 \ll 12, \S 54.
                                                                       trying_to_interrupt: bool,
Extern = macro, \S 4.
false = 0, \S 11.
                                   interim: bool, §44.
                                                                         §315.
                                                                       wait = macro(), \S 125.
fcmp = 22, \S 49.
                                   interrupt: unsigned int, §44.
FCMPE = #11, §47.
                                   is\_subnormal = macro(), §346.
                                                                       x: specnode, §44.
fcomp: int (), MMIX-ARITH §85. is_trivial = macro (), §346.
                                                                       y: spec, §44.
feps = 21, \S 49.
                                   j: register int, §12.
                                                                       z: spec, \S44.
fepscomp: int (),
```

**352.** System operations. Finally we need to implement some operations for the operating system; then the hardware simulation will be done!

A LDVTS instruction is delayed until it reaches the hot seat, because it changes the IT and DT caches. The operating system should use SYNC after LDVTS if the effects are needed immediately; the system is also responsible for ensuring that the page table permission bits agree with the LDVTS permission bits when the latter are nonzero. (Also, if write permission is taken away from a page, the operating system must have previously used SYNCD to write out any dirty bytes that might have been cached from that page; SYNCD will be inoperative after write permission goes away.)

```
\langle Handle special cases for operations like prego and ldvts 289\rangle + \equiv
   if (data \rightarrow i \equiv ldvts) (Do stage 1 of LDVTS 353);
353. \langle \text{ Do stage 1 of LDVTS } 353 \rangle \equiv
      if (data \neq old\_hot) wait(1);
      if (DTcache \rightarrow lock \lor (j = qet\_reader(DTcache)) < 0) wait(1);
      startup(\&DTcache \neg reader[j], DTcache \neg access\_time);
      data \rightarrow z.o.h = 0, data \rightarrow z.o.l = data \rightarrow y.o.l \& #7;
      p = cache\_search(DTcache, data \rightarrow y.o); /* N.B.: Not trans\_key(data \rightarrow y.o) */
      if (p) {
         data \rightarrow x.o.l = 2;
         if (data \rightarrow z.o.l) {
             p = use\_and\_fix(DTcache, p);
            p \rightarrow data[0].l = (p \rightarrow data[0].l \& -8) + data \rightarrow z.o.l;
         } else {
             p = demote\_and\_fix(DTcache, p);
             p \rightarrow tag.h \mid = sign\_bit;
                                         /* invalidate the tag */
      pass_after(DTcache→access_time); goto passit;
This code is used in section 352.
          \langle Special cases for states in later stages 272\rangle + \equiv
case ld\_st\_launch: if (ITcache \neg lock \lor (j = qet\_reader(ITcache)) < 0) wait(1);
   startup(\&ITcache \rightarrow reader[j], ITcache \rightarrow access\_time);
   p = cache\_search(ITcache, data \rightarrow y.o);
                                                         /* N.B.: Not trans_key(data→y.o) */
   if (p) {
      data \rightarrow x.o.l = 1;
      if (data \rightarrow z.o.l) {
         p = use\_and\_fix(ITcache, p);
         p \rightarrow data[0].l = (p \rightarrow data[0].l \& -8) + data \rightarrow z.o.l;
      } else {
         p = demote\_and\_fix(ITcache, p);
         p \rightarrow taq.h \mid = siqn\_bit;
                                       /* invalidate the tag */
      }
   data \rightarrow state = 3; wait(ITcache \rightarrow access\_time);
```

355. The SYNC operation interacts with the pipeline in interesting ways. SYNC 0 and SYNC 4 are the simplest; they just lock the dispatch and wait until they get to the hot seat, after which the pipeline has drained. SYNC 1 and SYNC 3 put a "barrier" into the write buffer so that subsequent store instructions will not merge with previous stores. SYNC 2 and SYNC 3 lock the dispatch until all previous load instructions have left the pipeline. SYNC 5, SYNC 6, and SYNC 7 remove things from caches once they get to the hot seat.

```
\langle Special cases of instruction dispatch 117\rangle + \equiv
case sync: if (cool \neg zz > 3) {
     if (¬(cool¬loc.h & sign_bit)) goto privileged_inst;
     if (cool \neg zz \equiv 4) freeze_dispatch = true;
  } else {
     if (cool \neg zz \neq 1) freeze_dispatch = true;
     if (cool \neg zz \& 1) cool \neg mem x = true, spec install(\&mem, \&cool \neg x);
  } break:
         \langle \text{ Cases for stage 1 execution 155} \rangle + \equiv
case sync: switch (data \neg zz) {
  case 0: case 4: if (data \neq old\_hot) wait(1);
     halted = (data \rightarrow zz \neq 0); goto fin\_ex;
  case 2: case 3: (Wait if there's an unfinished load ahead of us 357);
     release_lock(self, dispatch_lock);
  case 1: data \rightarrow x.addr = zero\_octa; goto fin\_ex;
  case 5: if (data \neq old\_hot) wait(1);
     (Clean the data caches 361);
  case 6: if (data \neq old\_hot) wait(1);
     ⟨Zap the translation caches 358⟩;
  case 7: if (data \neq old\_hot) wait(1);
      ⟨ Zap the instruction and data caches 359⟩;
```

```
access_time: int, §167.
addr: \mathbf{octa}, \S 40.
cache_search: static
  cacheblock *(), §193.
cool: control *, §60.
data: register control *,
  §124.
data: octa *, §167.
demote_and_fix: static
  cacheblock *(), §199.
dispatch_lock: lockvar, §65.
DTcache: cache *, §168.
fin_ex: label, §144.
freeze_dispatch: register
  bool, §75.
get_reader: static int (), §183.
h: tetra, §17.
halted: bool, §12.
i: internal_opcode, §44.
ITcache: cache *, §168.
```

```
j: register int, §12.
l: tetra, §17.
ld\_st\_launch = 7, \S 265.
ldvts = 60, \S 49.
loc: octa, §44.
lock: lockvar, §167.
mem: \mathbf{specnode}, \S 115.
mem_x: bool, §44.
o: octa, §40.
old_hot: control *, \S60.
p: register cacheblock *,
pass\_after = macro(), \S 125.
passit: label, §134.
prego = 73, \S 49.
privileged_inst: label, §118.
reader: coroutine *, §167.
release\_lock = macro(), §37.
self: register coroutine *,
```

```
§124.
sign\_bit = macro, \S 80.
spec_install: static void (),
startup: static void (), §31.
state: int, §44.
sync = 79, \S 49.
tag: octa, §167.
trans_k key = macro(), \S 240.
true = 1, \S 11.
use_and_fix: static
  cacheblock *(), §196.
wait = macro(), \S 125.
x: specnode, §44.
y: spec, §44.
z: spec, §44.
zero_octa: octa,
 MMIX-ARITH §4.
zz: unsigned char, §44.
```

```
357.
          \langle Wait if there's an unfinished load ahead of us 357\rangle \equiv
      register control *cc;
      for (cc = data; cc \neq hot;)
         cc = (cc \equiv reorder\_top ? reorder\_bot : cc + 1);
         if (cc \rightarrow owner \land (cc \rightarrow i \equiv ld \lor cc \rightarrow i \equiv ldunc \lor cc \rightarrow i \equiv pst)) wait(1);
   }
This code is used in section 356.
          Perhaps the delay should be longer here.
358.
\langle \text{ Zap the translation caches } 358 \rangle \equiv
   if (DTcache \neg lock \lor (j = qet\_reader(DTcache)) < 0) wait(1);
   startup(\&DTcache \rightarrow reader[j], DTcache \rightarrow access\_time);
   set\_lock(self, DTcache \rightarrow lock);
   zap\_cache(DTcache);
   data \rightarrow state = 10; wait(DTcache \rightarrow access\_time);
This code is used in section 356.
359. \langle Zap the instruction and data caches 359\rangle \equiv
   if (\neg Icache) {
      data \rightarrow state = 11; goto switch1;
   if (Icache \neg lock \lor (j = qet\_reader(Icache)) < 0) wait(1);
   startup(\&Icache \rightarrow reader[j], Icache \rightarrow access\_time);
   set\_lock(self, Icache \rightarrow lock);
   zap\_cache(Icache);
   data \rightarrow state = 11; wait(Icache \rightarrow access\_time);
This code is used in section 356.
         \langle Special cases for states in the first stage 266\rangle + \equiv
case 10: if (self \neg lockloc) *(self \neg lockloc) = \Lambda, self \neg lockloc = \Lambda;
   if (ITcache \rightarrow lock \lor (j = get\_reader(ITcache)) < 0) wait(1);
   startup(\&ITcache \rightarrow reader[j], ITcache \rightarrow access\_time);
   set\_lock(self, ITcache \rightarrow lock);
   zap\_cache(ITcache);
   data \rightarrow state = 3; wait(ITcache \rightarrow access\_time);
case 11: if (self \neg lockloc) *(self \neg lockloc) = \Lambda, self \neg lockloc = \Lambda;
   if (wbuf\_lock) wait(1);
   write\_head = write\_tail, write\_ctl.state = 0; /* zap the write buffer */
   if (\neg Dcache) {
      data \rightarrow state = 12; goto switch1;
   if (Dcache \neg lock \lor (j = get\_reader(Dcache)) < 0) wait(1);
   startup(\&Dcache \neg reader[j], Dcache \neg access\_time);
   set\_lock(self, Dcache \neg lock);
   zap\_cache(Dcache);
   data \rightarrow state = 12; wait(Dcache \rightarrow access\_time);
```

```
case 12: if (self \neg lockloc) *(self \neg lockloc) = \Lambda, self \neg lockloc = \Lambda;
   if (\neg Scache) goto fin\_ex;
   if (Scache \neg lock) wait(1);
   set\_lock(self, Scache \neg lock);
   zap\_cache(Scache);
   data \rightarrow state = 3; wait(Scache \rightarrow access\_time);
361.
          \langle Clean the data caches 361 \rangle \equiv
   if (self \neg lockloc) *(self \neg lockloc) = \Lambda, self \neg lockloc = \Lambda;
   ⟨ Wait till write buffer is empty 362⟩;
   if (clean\_co.next \lor clean\_lock) wait (1);
   set_lock(self, clean_lock);
   clean\_ctl.i = sync; \ clean\_ctl.state = 0; \ clean\_ctl.x.o.h = 0;
   startup(\&clean\_co, 1);
   data \rightarrow state = 13:
   data \rightarrow interim = true;
   wait(1);
This code is used in section 356.
362. Wait till write buffer is empty 362 \ge 10^{-3}
   if (write\_head \neq write\_tail) {
      if (\neg speed\_lock) set\_lock(self, speed\_lock);
      wait(1);
This code is used in sections 361 and 364.
```

**363.** The cleanup process might take a huge amount of time, so we must allow it to be interrupted. (Servicing the interruption might, of course, put more stuff into the cache.)

```
⟨ Special cases for states in the first stage 266⟩ +≡
case 13: if (¬clean_co.next) {
    data¬interim = false; goto fin_ex; /* it's done! */
}
if (trying_to_interrupt) goto fin_ex; /* accept an interruption */
wait(1);
```

```
access_time: int, §167.
clean_co: coroutine, §230.
clean_ctl: control, §230.
clean_lock: lockvar, §230.
control = struct, \S 44.
data: register control *,
  ξ124.
Dcache: cache *, §168.
DTcache: cache *, §168.
false = 0, \S 11.
fin_ex: label, §144.
get_reader: static int (), §183.
h: tetra, §17.
hot: \mathbf{control} *, \S 60.
i: internal_opcode, §44.
Icache: cache *, §168.
interim: bool, §44.
```

```
ITcache: cache *, §168.
j: register int, §12.
ld = 56, \S 49.
ldunc = 59, \S 49.
lock: lockvar, §167.
lockloc: coroutine **, \S 23.
next: coroutine *, \S 23.
o: octa, §40.
owner: coroutine *, §44.
pst = 66, \S 49.
reader: coroutine *, §167.
reorder_bot: control *, §60.
reorder_top: control *, §60.
Scache: cache *, §168.
self: register coroutine *,
  §124.
set\_lock = macro(), \S 37.
```

speed\_lock: lockvar, §247. startup: static void (), §31. state: int, §44. switch1: label, §130.  $sync = 79, \S 49.$  $true = 1, \S 11.$ trying\_to\_interrupt: bool, §315.  $wait = macro(), \S 125.$ wbuf\_lock: lockvar, §247. write\_ctl: control, §248. write\_head: write\_node \*, §247. write\_tail: write\_node \*, §247. x: specnode, §44. *zap\_cache*: **void** (), §181.

**364.** Now we consider SYNCD and SYNCID. When control comes to this part of the program,  $data \rightarrow y.o$  is a virtual address and  $data \rightarrow z.o$  is the corresponding physical address;  $data \rightarrow xx + 1$  is the number of bytes we are supposed to be syncing;  $data \rightarrow b.o.l$  is the number of bytes we can handle at once (either  $Icache \rightarrow bb$  or  $Dcache \rightarrow bb$  or 8192).

We need a more elaborate scheme to implement SYNCD and SYNCID than we have used for the "hint" instructions PRELD, PREGO, and PREST, because SYNCD and SYNCID are not merely hints. They cannot be converted into a sequence of cache-block-size commands at dispatch time, because we cannot be sure that the starting virtual address will be aligned with the beginning of a cache block. We need to realize that the bytes specified by SYNCD or SYNCID might cross a virtual page boundary—possibly with different protection bits on each page. We need to allow for interrupts. And we also need to keep the fetch buffer empty until a user's SYNCID has completely brought the memory up to date.

```
\langle Special cases for states in later stages 272\rangle + \equiv
do\_syncid: data \rightarrow state = 30;
case 30: if (data \neq old\_hot) wait(1);
   if (\neg Icache) {
      data \rightarrow state = (data \rightarrow loc.h \& sign\_bit ? 31 : 33); goto switch2;
   \langle Clean the I-cache block for data \neg z.o, if any 365 \rangle;
   data \rightarrow state = (data \rightarrow loc.h \& sign\_bit ? 31 : 33); wait(Icache \rightarrow access\_time);
case 31: if (self \neg lockloc) *(self \neg lockloc) = \Lambda, self \neg lockloc = \Lambda;
   Wait till write buffer is empty 362);
   if (((data \rightarrow b.o.l - 1) \& \sim data \rightarrow y.o.l) < data \rightarrow xx) data \rightarrow interim = true;
   if (\neg Dcache) goto next\_sync;
   \langle Clean the D-cache block for data \rightarrow z.o, if any 366\rangle;
   data \rightarrow state = 32; wait(Dcache \rightarrow access\_time);
case 32: if (self \neg lockloc) *(self \neg lockloc) = \Lambda, self \neg lockloc = \Lambda;
   if (\neg Scache) goto next\_sync;
   \langle Clean the S-cache block for data \neg z.o, if any 367\rangle;
   data \neg state = 35; wait(Scache \neg access\_time);
do\_syncd: data \rightarrow state = 33;
case 33: if (data \neq old\_hot) wait(1);
   if (self \neg lockloc) *(self \neg lockloc) = \Lambda, self \neg lockloc = \Lambda;
   ⟨ Wait till write buffer is empty 362⟩;
   if (((data \rightarrow b.o.l - 1) \& \sim data \rightarrow y.o.l) < data \rightarrow xx) data \rightarrow interim = true;
   if (\neg Dcache)
      if (data \rightarrow i \equiv syncd) goto fin_ex; else goto next\_sync;
   (Use cleanup on the cache blocks for data \neg z.o, if any 368);
   data \rightarrow state = 34;
case 34: if (\neg clean\_co.next) goto next\_sync;
   if (trying\_to\_interrupt \land data \rightarrow interim \land data \equiv old\_hot) {
                                      /* anticipate RESUME_CONT */
      data \rightarrow z.o = zero\_octa;
      goto fin_{-}ex;
                             /* accept an interruption */
   wait(1);
next\_sync: data \neg state = 35;
```

z: spec, §44.

zero\_octa: octa,

MMIX-ARITH §4.

get\_reader: static int (), §183.

 $qo: \mathbf{specnode}, \S 44.$ 

h: **tetra**, §17.

```
case 35: if (self \neg lockloc) *(self \neg lockloc) = \Lambda, self \neg lockloc = \Lambda;
   if (data-interim) (Continue this command on the next cache block 369);
   data \rightarrow qo.known = true;
   goto fin_{-}ex;
365. (Clean the I-cache block for data \rightarrow z.o, if any 365)
   if (Icache \neg lock \lor (j = qet\_reader(Icache)) < 0) wait(1);
   startup(\&Icache \rightarrow reader[j], Icache \rightarrow access\_time);
   set\_lock(self, Icache \neg lock);
   p = cache\_search(Icache, data \neg z.o);
   if (p) {
      demote\_and\_fix(Icache, p);
      clean\_block(Icache, p);
This code is used in section 364.
        \langle Clean the D-cache block for data \rightarrow z.o, if any 366 \rangle \equiv
   if (Dcache \neg lock \lor (j = qet\_reader(Dcache)) < 0) wait(1);
   startup(\&Dcache \rightarrow reader[j], Dcache \rightarrow access\_time);
   set\_lock(self, Dcache \neg lock);
   p = cache\_search(Dcache, data \neg z.o);
   if (p) {
      demote\_and\_fix(Dcache, p);
      clean\_block(Dcache, p);
This code is used in section 364.
         \langle Clean the S-cache block for data \rightarrow z.o, if any 367 \rangle \equiv
   if (Scache \rightarrow lock) wait(1);
   set\_lock(self, Scache \neg lock);
   p = cache\_search(Scache, data \neg z.o);
   if (p) {
      demote\_and\_fix(Scache, p);
      clean\_block(Scache, p);
This code is used in section 364.
access\_time: int, §167.
                                     i: internal_opcode, §44.
                                                                           self: register coroutine *,
b: spec, §44.
                                      Icache: cache *, §168.
                                                                             §124.
bb: int, §167.
                                      interim: bool, §44.
                                                                           set\_lock = macro(), \S 37.
cache_search: static
                                     j: register int, §12.
                                                                           sign\_bit = macro, \S 80.
  cacheblock *(), §193.
                                      known: bool, \S 40.
                                                                           startup: static void (), §31.
clean_block: void (), §179.
                                     l: tetra, §17.
                                                                           state: int, \S44.
                                                                           switch2: label, §135.
clean_co: coroutine, §230.
                                     loc: octa, §44.
                                      lock: lockvar, §167.
cleanup = 91, \S 129.
                                                                           syncd = 64, \S 49.
data: register control *,
                                      lockloc: coroutine **, §23.
                                                                           true = 1, \S 11.
  §124.
                                     next: coroutine *, §23.
                                                                           trying_to_interrupt: bool,
Dcache: cache *, §168.
                                     o: octa, §40.
                                                                             §315.
demote\_and\_fix\colon \mathbf{static}
                                     old\_hot: control *, §60.
                                                                           wait = macro(), \S 125.
  cacheblock *(), §199.
                                     p: register cacheblock *,
                                                                           xx: unsigned char, §44.
fin_ex: label, §144.
                                        §258.
                                                                           y: spec, §44.
```

reader: coroutine \*, §167.

RESUME\_CONT = 1,  $\S 320$ .

Scache: cache \*, §168.

```
368. (Use cleanup on the cache blocks for data \neg z.o, if any 368) \equiv
   if (clean\_co.next \lor clean\_lock) wait(1);
   set_lock(self, clean_lock);
   clean\_ctl.i = syncd;
   clean\_ctl.state = 4;
   clean\_ctl.x.o.h = data \neg loc.h \& sign\_bit;
   clean\_ctl.z.o = data \neg z.o;
   schedule(\&clean\_co, 1, 4);
This code is used in section 364.
          We use the fact that cache block sizes are divisors of 8192.
\langle Continue this command on the next cache block 369\rangle \equiv
      data \rightarrow interim = false;
      data \rightarrow xx = ((data \rightarrow b.o.l - 1) \& \sim data \rightarrow y.o.l) + 1;
      data \rightarrow y.o = incr(data \rightarrow y.o, data \rightarrow b.o.l);
      data \rightarrow y.o.l \&= -data \rightarrow b.o.l;
      data \rightarrow z.o.l = (data \rightarrow z.o.l \& -8192) + (data \rightarrow y.o.l \& 8191);
      if ((data \rightarrow y.o.l \& 8191) \equiv 0) goto square_one;
            /* maybe crossed a page boundary */
      if (data \rightarrow i \equiv syncd) goto do\_syncd; else goto do\_syncid;
This code is used in section 364.
370. If the first page lacks proper protection, we still must try the second, in the
rare case that a page boundary is spanned.
\langle Special cases for states in later stages 272\rangle + \equiv
sync\_check: if ((data \rightarrow y.o.l \oplus (data \rightarrow y.o.l + data \rightarrow xx)) \ge 8192)  {
      data \rightarrow xx = (8191 \& \sim data \rightarrow y.o.l) + 1;
      data \rightarrow y.o = incr(data \rightarrow y.o, 8192);
      data \rightarrow y.o.l \& = -8192;
      goto square_one;
   goto fin_{-}ex;
```

**371. Input and output.** We're done implementing the hardware, but there's still a small matter of software remaining, because we sometimes want to pretend that a real operating system is present without actually having one loaded. This simulator therefore implements a special feature: If RESUME 1 is issued in location rT, the ten special I/O traps of MMIX-SIM are performed instantaneously behind the scenes.

Of course all claims of accurate simulation go out the door when this feature is used.

```
#define max_sys_call Ftell

⟨Type definitions 11⟩ +≡

typedef enum {

Halt, Fopen, Fclose, Fread, Fgets, Fgetws, Fwrite, Fputs, Fputws, Fseek, Ftell
} sys_call;
```

```
b: spec, §44.

clean_co: coroutine, §230.

clean_ctl: control, §230.

clean_lock: lockvar, §230.

cleanup = 91, §129.

data: register control *,

§124.

do_syncd: label, §364.

do_syncid: label, §364.

false = 0, §11.

fin_ex: label, §144.
```

```
h: tetra, §17.
i: internal_opcode, §44.
incr: octa (), MMIX-ARITH §6.
interim: bool, §44.
l: tetra, §17.
loc: octa, §44.
next: coroutine *, §23.
o: octa, §40.
schedule: static void (), §28.
self: register coroutine *, §124.
```

```
set_lock = macro (), §37.
sign_bit = macro, §80.
square_one: label, §272.
state: int, §44.
syncd = 64, §49.
wait = macro (), §125.
x: specnode, §44.
xx: unsigned char, §44.
y: spec, §44.
z: spec, §44.
```

```
\langle \text{Magically do an I/O operation, if } cool \neg loc \text{ is rT } 372 \rangle \equiv
  if (cool \neg loc.l \equiv g[rT].o.l \land cool \neg loc.h \equiv g[rT].o.h) {
    register unsigned char yy, zz;
    octa ma. mb:
    if (q[rXX].o.l \& #fffff0000) goto magic\_done;
     yy = g[rXX].o.l \gg 8, zz = g[rXX].o.l \& #ff;
    if (yy > max\_sys\_call) goto magic\_done;
     \langle Prepare memory arguments ma = M[a] and mb = M[b] if needed 380\rangle;
    switch (yy) {
    case Halt: (Either halt or print warning 373); break;
    case Fopen: q[rBB].o = mmix\_fopen(zz, mb, ma); break;
    case Fclose: g[rBB].o = mmix\_fclose(zz); break;
    case Fread: q[rBB].o = mmix\_fread(zz, mb, ma); break;
    case Fgets: g[rBB].o = mmix\_fgets(zz, mb, ma); break;
    case Fgetws: g[rBB].o = mmix\_fgetws(zz, mb, ma); break;
    case Fwrite: q[rBB].o = mmix\_fwrite(zz, mb, ma); break;
    case Fputs: g[rBB].o = mmix\_fputs(zz, g[rBB].o); break;
    case Fputws: q[rBB].o = mmix\_fputws(zz, q[rBB].o); break;
    case Fseek: g[rBB].o = mmix\_fseek(zz, g[rBB].o); break;
    case Ftell: q[rBB].o = mmix\_ftell(zz); break;
  magic\_done: g[255].o = neg\_one; /* this will enable interrupts */
This code is used in section 322.
373. \langle Either halt or print warning 373\rangle \equiv
  if (\neg zz) halted = true;
  else if (zz \equiv 1) {
    octa trap_loc;
     trap\_loc = incr(q[rWW].o, -4);
    if (\neg(trap\_loc.h \lor trap\_loc.l \ge \#f0))
       print\_trip\_warning(trap\_loc.l \gg 4, incr(q[rW].o, -4));
  }
This code is used in section 372.
374. \langle Global variables 20 \rangle + \equiv
  char arg\_count[] = \{1, 3, 1, 3, 3, 3, 3, 2, 2, 2, 1\};
        The input/output operations invoked by TRAPs are done by subroutines in an
auxiliary program module called MMIX-IO. Here we need only declare those subrou-
tines, and write three primitive interfaces on which they depend.
        \langle Global variables 20\rangle + \equiv
  extern octa mmix_fopen ARGS((unsigned char, octa, octa));
  extern octa mmix_fclose ARGS((unsigned char));
  extern octa mmix_fread ARGS((unsigned char, octa, octa));
  extern octa mmix_fgets ARGS((unsigned char, octa, octa));
  extern octa mmix_fqetws ARGS((unsigned char, octa, octa));
  extern octa mmix_fwrite ARGS((unsigned char, octa, octa));
  extern octa mmix_fputs ARGS((unsigned char, octa));
```

```
ARGS = macro, \S 6.
                                  l: tetra, §17.
                                                                    mmix_fseek: octa (),
cool: control *, §60.
                                  loc: octa, §44.
                                                                     MMIX-IO §21.
Fclose = 2, \S 371.
                                  max\_sys\_call = macro, \S 371.
                                                                    mmix_ftell: octa (),
Fgets = 4, \S 371.
                                 mmix_fclose: octa (),
                                                                     MMIX-IO §22.
Fgetws = 5, \S 371.
                                   MMIX-IO §11.
                                                                   mmix_fwrite: octa (),
                                 mmix_fgets: octa (),
Fopen = 1, \S 371.
                                                                     MMIX-IO §18.
Fputs = 7, \S 371.
                                   MMIX-IO §14.
                                                                    neg_one: octa, MMIX-ARITH §4.
Fputws = 8, \S 371.
                                 mmix_fgetws: octa (),
                                                                    o: octa, §40.
Fread = 3, \S 371.
                                   MMIX-IO §16.
                                                                    octa = struct, \S 17.
Fseek = 9, \S 371.
                                 mmix_fopen: octa (),
                                                                   print_trip_warning: void (),
                                   MMIX-IO §8.
Ftell = 10, \S 371.
                                                                     MMIX-IO §23.
Fwrite = 6, §371.
                                 mmix_fputs: octa (),
                                                                   rBB = 7, \S 52.
                                                                   rT = 13, \S 52.
g: specnode [], §86.
                                   MMIX-IO §19.
                                                                   rW = 24, \S 52.
h: tetra, §17.
                                 mmix_fputws: octa (),
                                                                   rWW = 28, \S 52.
Halt = 0, \S 371.
                                  MMIX-IO §20.
                                 mmix_fread: octa (),
halted: bool, §12.
                                                                   rXX = 29, \S 52.
incr: octa (), MMIX-ARITH §6.
                                  MMIX-IO §12.
                                                                    true = 1, \S 11.
```

**378.** We need to cut through all the complications of buffers and caches in order to do magical I/O. The *magic\_read* routine finds the current octabyte in a given physical address by looking at the write buffer, D-cache, S-cache, and memory until finding it.

```
\langle Subroutines 14\rangle + \equiv
  octa magic_read(addr)
        octa addr;
     register write_node *q;
     register cacheblock *p;
     for (q = write\_tail; ;) {
        if (q \equiv write\_head) break;
        if (q \equiv wbuf\_top) q = wbuf\_bot; else q +++;
        if ((q \rightarrow addr.l \& -8) \equiv (addr.l \& -8) \land q \rightarrow addr.h \equiv addr.h) return q \rightarrow o:
     if (Dcache) {
        p = cache\_search(Dcache, addr);
        if (p) return p \rightarrow data[(addr.l \& (Dcache \rightarrow bb - 1)) \gg 3];
        if (((Dcache \neg outbuf.tag.l \oplus addr.l) \& -Dcache \neg bb) \equiv 0 \land Dcache \neg outbuf.tag.h \equiv
                  addr.h) return Dcache \rightarrow outbuf.data[(addr.l & (Dcache \rightarrow bb - 1)) \gg 3];
        if (Scache) {
            p = cache\_search(Scache, addr);
            if (p) return p \rightarrow data[(addr.l \& (Scache \rightarrow bb - 1)) \gg 3];
            if (((Scache \neg outbuf.tag.l \oplus addr.l) \& -Scache \neg bb) \equiv 0 \land Scache \neg outbuf.tag.h \equiv
                     addr.h) return Scache \rightarrow outbuf.data[(addr.l & (Scache \rightarrow bb - 1)) \gg 3];
         }
     return mem_read(addr);
         The magic_write routine changes the octabyte in a given physical address by
changing it wherever it appears in a buffer or cache. Any "dirty" or "least recently
used" status remains unchanged. (Yes, this is magic.)
\langle \text{Subroutines } 14 \rangle + \equiv
  void magic_write(addr, val)
         octa addr, val;
   {
     register write_node *q;
     register cacheblock *p;
     for (q = write\_tail; ;) {
        if (q \equiv write\_head) break;
        if (q \equiv wbuf\_top) q = wbuf\_bot; else q +++;
         if ((q \rightarrow addr.l \& -8) \equiv (addr.l \& -8) \land q \rightarrow addr.h \equiv addr.h) \ q \rightarrow o = val;
     if (Dcache) {
        p = cache\_search(Dcache, addr);
        if (p) p \rightarrow data[(addr.l \& (Dcache \rightarrow bb - 1)) \gg 3] = val;
```

if  $(((Dcache \neg inbuf.tag.l \oplus addr.l) \& -Dcache \neg bb) \equiv 0 \land Dcache \neg inbuf.tag.h \equiv addr.h)$ 

 $Dcache \rightarrow inbuf.data[(addr.l \& (Dcache \rightarrow bb - 1)) \gg 3] = val;$ 

**380.** The conventions of our imaginary operating system require us to apply the trivial memory mapping in which segment i appears in a  $2^{32}$ -byte page of physical addresses starting at  $2^{32}i$ .

```
\langle \operatorname{Prepare\ memory\ arguments}\ ma = \operatorname{M}[a]\ \operatorname{and}\ mb = \operatorname{M}[b]\ \operatorname{if\ needed}\ 380 \rangle \equiv \\ \operatorname{if\ }(arg\_count[yy] \equiv 3)\ \{\\ \operatorname{octa\ }arg\_loc;\\ arg\_loc = g[rBB].o;\\ \operatorname{if\ }(arg\_loc.h\ \&\ ^\#\operatorname{9fffffff})\ mb = zero\_octa;\\ \operatorname{else\ }arg\_loc.h\ \gg = 29, mb = magic\_read(arg\_loc);\\ arg\_loc = incr(g[rBB].o, 8);\\ \operatorname{if\ }(arg\_loc.h\ \&\ ^\#\operatorname{9fffffff})\ ma = zero\_octa;\\ \operatorname{else\ }arg\_loc.h\ \gg = 29, ma = magic\_read(arg\_loc);\\ \}
```

This code is used in section 372.

```
addr: octa, §246.
                                 l: tetra, §17.
                                                                   wbuf\_bot: write_node *, §247.
arg_count: char [], §374.
                                 ma: octa, §372.
                                                                   wbuf\_top: write\_node *, §247.
                                                                   write_head: write_node *,
bb: int, §167.
                                 mb: octa, §372.
cache_search: static
                                 mem_read: octa (), §210.
                                                                     §247.
  cacheblock *(), §193.
                                 mem\_write: void (), §213.
                                                                   write\_node = struct, \S 246.
cacheblock = struct, \S 167.
                                 o: octa, §246.
                                                                   write_tail: write_node *,
data: octa *, §167.
                                 o: octa, §40.
                                                                     §247.
Dcache: cache *, §168.
                                 octa = struct, \S 17.
                                                                   yy: register unsigned char,
g: specnode [], §86.
                                 outbuf: cacheblock, §167.
                                                                     §372.
h: tetra, §17.
                                 rBB = 7, \S 52.
                                                                   zero_octa: octa,
inbuf: cacheblock, §167.
                                 Scache: cache *, §168.
                                                                     MMIX-ARITH §4.
incr: octa (), MMIX-ARITH §6.
                                 taq: octa, §167.
```

**381.** The subroutine mmgetchars(buf, size, addr, stop) reads characters starting at address addr in the simulated memory and stores them in buf, continuing until size characters have been read or some other stopping criterion has been met. If stop < 0 there is no other criterion; if stop = 0 a null character will also terminate the process; otherwise addr is even, and two consecutive null bytes starting at an even address will terminate the process. The number of bytes read and stored, exclusive of terminating nulls, is returned.

```
\langle Subroutines 14\rangle + \equiv
  int mmgetchars (buf, size, addr, stop)
        char *buf;
        int size;
        octa addr;
        int stop:
     register char *p;
     register int m;
     octa a, x;
     if (((addr.h \& #9fffffff) \lor (incr(addr, size - 1).h \& #9fffffff)) \land size) {
        fprintf(stderr, "Attempt, to get, characters, from off, the page!\n");
        return 0:
     for (p = buf, m = 0, a = addr, a.h \gg 29; m < size;)
        x = magic\_read(a);
        if ((a.l \& #7) \lor m > size - 8) (Read and store one byte; return if done 382)
        else (Read and store up to eight bytes; return if done 383)
     return size;
  }
        \langle \text{Read and store one byte; return if done } 382 \rangle \equiv
     if (a.l \& #4) *p = (x.l \gg (8 * ((\sim a.l) \& #3))) \& #ff;
     else *p = (x.h \gg (8 * ((\sim a.l) \& #3))) \& #ff;
     if (\neg *p \land stop \ge 0) {
        if (stop \equiv 0) return m;
        if ((a.l \& #1) \land *(p-1) \equiv '\0') return m-1;
     p++, m++, a = incr(a, 1);
This code is used in section 381.
383. \langle Read and store up to eight bytes; return if done 383\rangle \equiv
  {
     *p = x.h \gg 24;
     if (\neg *p \land (stop \equiv 0 \lor (stop > 0 \land x.h < \#10000))) return m;
     *(p+1) = (x.h \gg 16) \& #ff;
     if (\neg *(p+1) \land stop \equiv 0) return m+1;
     *(p+2) = (x.h \gg 8) \& #ff:
     if (\neg *(p+2) \land (stop \equiv 0 \lor (stop > 0 \land (x.h \& \#ffff) \equiv 0))) return m+2;
```

```
*(p+3) = x.h \& #ff;
     if (\neg *(p+3) \land stop \equiv 0) return m+3;
     *(p+4) = x.l \gg 24;
     if (\neg *(p+4) \land (stop \equiv 0 \lor (stop > 0 \land x.l < \#10000))) return m+4;
     *(p+5) = (x.l \gg 16) \& #ff;
     if (\neg *(p+5) \land stop \equiv 0) return m+5;
     *(p+6) = (x.l \gg 8) \& #ff;
     if (\neg *(p+6) \land (stop \equiv 0 \lor (stop > 0 \land (x.l \& \#ffff) \equiv 0))) return m+6;
     *(p+7) = x.l \& #ff;
     if (\neg *(p+7) \land stop \equiv 0) return m+7;
     p += 8, m += 8, a = incr(a, 8);
This code is used in section 381.
        The subroutine mmputchars(buf, size, addr) puts size characters into the sim-
ulated memory starting at address addr.
\langle Subroutines 14\rangle + \equiv
  void mmputchars (buf, size, addr)
        unsigned char *buf;
        int size;
        octa addr;
     register unsigned char *p;
     register int m;
     octa a, x:
     if (((addr.h \& #9fffffff) \lor (incr(addr, size - 1).h \& #9fffffff)) \land size) {
        fprintf(stderr, "Attempt_\_to_\_put_\_characters_\_off_\_the_\_page! \n");
        return:
     for (p = buf, m = 0, a = addr, a.h \gg 29; m < size;)
        if ((a.l \& #7) \lor m > size - 8) \land Load and write one byte 385)
        else (Load and write eight bytes 386);
     }
  }
385.
        \langle \text{Load and write one byte 385} \rangle \equiv
     register int s = 8 * ((\sim a.l) \& #3);
     x = magic\_read(a);
     if (a.l \& #4) x.l \oplus = (((x.l \gg s) \oplus *p) \& #ff) \ll s;
     else x.h \oplus = (((x.h \gg s) \oplus *p) \& #ff) \ll s;
     magic\_write(a, x);
     p++, m++, a = incr(a, 1);
```

This code is used in section 384.

```
386. \langle Load and write eight bytes 386\rangle \equiv { x.h = (*p \ll 24) + (*(p+1) \ll 16) + (*(p+2) \ll 8) + *(p+3); x.l = (*(p+4) \ll 24) + (*(p+5) \ll 16) + (*(p+6) \ll 8) + *(p+7); magic\_write(a,x); p+=8, m+=8, a=incr(a,8); }
```

This code is used in section 384.

**387.** When standard input is being read by the simulated program at the same time as it is being used for interaction, we try to keep the two uses separate by maintaining a private buffer for the simulated program's StdIn. Online input is usually transmitted from the keyboard to a C program a line at a time; therefore an *fgets* operation works much better than *fread* when we prompt for new input. But there is a slight complication, because *fgets* might read a null character before coming to a newline character. We cannot deduce the number of characters read by *fgets* simply by looking at  $strlen(stdin\_buf)$ .

```
\langle Subroutines 14\rangle + \equiv
  char stdin_chr()
  {
     register char *p;
     while (stdin\_buf\_start \equiv stdin\_buf\_end) {
        printf("StdIn>□"); fflush(stdout);
        fgets(stdin\_buf, 256, stdin);
        stdin\_buf\_start = stdin\_buf;
        for (p = stdin\_buf; p < stdin\_buf + 254; p++)
          if (*p \equiv '\n') break;
        stdin\_buf\_end = p + 1;
     return *stdin_buf_start ++;
388.
        \langle \text{Global variables } 20 \rangle + \equiv
  char stdin_buf [256];
                            /* standard input to the simulated program */
  char *stdin_buf_start;
                                /* current position in that buffer */
  char *stdin_buf_end;
                              /* current end of that buffer */
```

## 389. Names of the sections.

```
\langle Allocate a slot p in the S-cache 218\rangle Used in section 217.
Assign a functional unit if available, otherwise goto stall 82 Used in section 75.
(Begin an interruption and break 317) Used in section 146.
Begin execution of a stage-two operation 351 Used in section 135.
Begin execution of an operation 132 Used in section 130.
(Begin fetch with known physical address 296) Used in section 288.
(Begin fetch without I-cache lookup 295) Used in section 291.
Cases 0 through 4, for the D-cache 233
                                              Used in section 232.
(Cases 5 through 9, for the S-cache 234) Used in section 232.
(Cases for control of special coroutines 126, 215, 217, 222, 224, 232, 237, 257)
(Cases for stage 1 execution 155, 313, 325, 327, 328, 329, 331, 356) Used in section 132.
Cases to compute the results of register-to-register operation 137, 138, 139, 140, 141,
  142, 143, 343, 344, 345, 346, 348, 350 \ Used in section 132.
(Cases to compute the virtual address of a memory operation 265) Used in sec-
  tion 132.
(Check for a hit in pending writes 278) Used in section 273.
Check for external interrupt 314 \ Used in section 64.
Check for prest with a fully spanned cache block 275 \ Used in section 274.
Check for security violation, break if so 149 Used in section 67.
(Check for sufficient rename registers and memory slots, or goto stall 111)
  in section 75.
(Check the protection bits and get the physical address 269) Used in sections 268,
  270, and 272.
\langle Clean the D-cache block for data \rightarrow z.o, if any 366 \rangle
                                                        Used in section 364.
Clean the data caches 361 \ Used in section 356.
 Clean the I-cache block for data \rightarrow z.o, if any 365
                                                       Used in section 364.
 Clean the S-cache block for data \neg z.o., if any 367 \ Used in section 364.
Commit and/or deissue up to commit_max instructions 67 \ Used in section 64.
 Commit the hottest instruction, or break if it's not ready 146 \( \rightarrow \) Used in section 67.
 Commit to memory if possible, otherwise break 256 Used in section 146.
(Compute the new entry for c-inbuf and give the caller a sneak preview 245)
  in section 237.
(Continue this command on the next cache block 369) Used in section 364.
(Convert relative address to absolute address 84) Used in section 75.
Copy data from p into c-inbuf 226 \ Used in section 224.
\langle \text{Copy } Scache \neg inbuf \text{ to slot } p \text{ 220} \rangle Used in section 217.
\langle Copy the data from block q to fetched 294\rangle Used in sections 292 and 296.
```

```
a: octa, \S384.l: tetra, \S17.printf: int (), <stdio.h>.fflush: int (), <stdio.h>.m: register int, \S384.stdin: FILE *, <stdio.h>.fgets: char *(), <stdio.h>.magic_write: void (), \S379.stdout: FILE *, <stdio.h>.fread: size_t (), <stdio.h>.p: register unsigned char *,strlen: size_t (), <string.h>.h: tetra, \S17.\S384.x: octa, \S384.
```

```
(Declare mmix_opcode and internal_opcode 47, 49) Used in section 44.
(Deissue all but the hottest command 316) Used in section 314.
(Deissue the coolest instruction 145) Used in section 67.
\langle Determine the flags, f, and the internal opcode, i 80 \rangle Used in section 75.
(Dispatch an instruction to the cool block if possible, otherwise goto stall 101)
  Used in section 75.
(Dispatch one cycle's worth of instructions 74) Used in section 64.
(Do a simultaneous lookup in the D-cache 268) Used in section 267.
 Do a simultaneous lookup in the I-cache 292 \ Used in section 291.
 Do load/store stage 1 without D-cache lookup 270 \ Used in section 267.
 Do load/store stage 1 with known physical address 271 \ Used in section 266.
 Do load/store stage 2 without D-cache lookup 277 \ Used in section 273.
 Do stage 1 of LDVTS 353 \ Used in section 352.
 Do the final SAVE 340 \ Used in section 339.
 Either halt or print warning 373 \ Used in section 372.
 Execute all coroutines scheduled for the current time 125 \ Used in section 64.
 External prototypes 9, 38, 161, 175, 178, 180, 209, 212, 252 \ Used in sections 3 and 5.
External routines 10, 39, 162, 176, 179, 181, 210, 213, 253 Used in section 3.
External variables 4, 29, 59, 60, 66, 69, 77, 86, 87, 98, 115, 136, 150, 168, 207, 211, 214, 242, 247,
  284, 349 Used in sections 3 and 5.
⟨ Fill Scache→inbuf with clean memory data 219⟩ Used in section 217.
 Finish a CSWAP 283 \ Used in section 281.
 Finish a store command 281 \rangle Used in section 280.
 Finish execution of an operation 144 \rangle Used in section 130.
(Forward the new data past the D-cache if it is write-through 263) Used in sec-
  tion 257.
\langle Generate an instruction to save g[yy] 339 \rangle Used in section 337.
\langle Generate an instruction to unsave q[yy] 333 \rangle Used in section 332.
(Get ready for the next step of PREGO 229) Used in section 81.
Get ready for the next step of PRELD or PREST 228 Used in section 81.
(Get ready for the next step of SAVE 341) Used in section 81.
(Get ready for the next step of UNSAVE 335) Used in section 81.
Global variables 20, 36, 41, 48, 50, 51, 53, 54, 65, 70, 78, 83, 88, 99, 107, 127, 148, 154, 194, 230,
  235, 238, 248, 285, 303, 305, 315, 374, 376, 388 \ Used in section 3.
(Handle an internal SAVE when it's time to store 342) Used in section 281.
(Handle an internal UNSAVE when it's time to load 336) Used in section 279.
(Handle interrupt at end of execution stage 307) Used in section 144.
 Handle special cases for operations like prego and ldvts 289, 352 Used in section 266.
(Handle write-around when flushing to the S-cache 221) Used in section 217.
(Handle write-around when writing to the D-cache 259) Used in section 257.
(Header definitions 6, 7, 8, 52, 57, 129, 166) Used in sections 3 and 5.
(Ignore the item in write_head 264) Used in section 257.
(Initialize everything 22, 26, 61, 71, 79, 89, 116, 128, 153, 231, 236, 249, 286) Used in sec-
  tion 10.
(Insert an instruction to advance beta and L 112) Used in section 110.
```

```
(Insert an instruction to advance gamma 113) Used in sections 110, 119, and 337.
(Insert an instruction to decrease gamma 114) Used in section 120.
\langle Insert data-b.o into the proper field of data-x.o, checking for arithmetic exceptions
  if signed 282 \ Used in section 281.
(Insert dummy instruction for page table emulation 302) Used in section 298.
(Insert special operands when resuming an interrupted operation 324) Used in
  section 103.
(Install a new instruction into the tail position 304) Used in section 301.
(Install default fields in the cool block 100) Used in section 75.
(Install register X as the destination, or insert an internal command and goto
  dispatch_done if X is marginal 110 \ Used in section 101.
(Install the operand fields of the cool block 103) Used in section 101.
\(\langle \text{Internal prototypes 13, 18, 24, 27, 30, 32, 34, 42, 45, 55, 62, 72, 90, 92, 94, 96, 156, 158, 169, 171,
  173, 182, 184, 186, 188, 190, 192, 195, 198, 200, 202, 204, 240, 250, 254, 377 \ Used in section 3.
\langle Issue j pseudo-instructions to compute a page table entry 244\rangle Used in section 243.
\langle \text{ Issue the } cool \text{ instruction } 81 \rangle \text{ Used in section } 75.
(Load and write eight bytes 386) Used in section 384.
(Load and write one byte 385) Used in section 384.
(Local variables 12, 124, 258) Used in section 10.
\langle \text{Look at the } head \text{ instruction, and try to dispatch it if } j < dispatch\_max 75 \rangle
  in section 74.
(Look up the address in the DT-cache, and also in the D-cache if possible 267)
  Used in section 266.
(Look up the address in the IT-cache, and also in the I-cache if possible 291)
                                                                                       Used
  in section 288.
⟨ Magically do an I/O operation, if cool¬loc is rT 372⟩ Used in section 322.
\langle \text{ Make sure } cool\_L \text{ and } cool\_G \text{ are up to date } 102 \rangle Used in section 101.
Nullify the hottest instruction 147 Used in section 146.
Other cases for the fetch coroutine 298, 301 \ Used in section 288.
(Pass data to the next stage of the pipeline 134) Used in section 130.
(Perform one cycle of the interrupt preparations 318) Used in section 64.
(Perform one machine cycle 64) Used in section 10.
(Predict a branch outcome 151) Used in section 85.
(Prepare for exceptional trip handler 308) Used in section 307.
\langle Prepare memory arguments ma = M[a] and mb = M[b] if needed 380 \rangle Used in
(Prepare to emulate the page translation 309) Used in section 310.
\langle \text{ Print all of } c's cache blocks 177\rangle Used in section 176.
(Read and store one byte; return if done 382) Used in section 381.
Read and store up to eight bytes; return if done 383 Used in section 381.
\langle \text{Read data into } c \neg inbuf \text{ and wait for the bus 223} \rangle Used in section 222.
(Read from memory into fetched 297) Used in section 296.
(Record the result of branch prediction 152) Used in section 75.
(Recover from incorrect branch prediction 160) Used in section 155.
(Redirect the fetch if control changes at this inst 85) Used in section 75.
(Restart the fetch coroutine 287) Used in sections 85, 160, 308, 309, and 316.
```

(Resume an interrupted operation 323) Used in section 322.

```
\langle \text{Set } cool \neg b \text{ and/or } cool \neg ra \text{ from special register } 108 \rangle Used in section 103.
⟨ Set cool→b from register X 106 ⟩ Used in section 103.
⟨ Set cool¬y from register Y 105 ⟩ Used in section 103.
Set cool \neg z as an immediate wyde 109 \ Used in section 103.
Set cool \neg z from register Z 104 \ Used in section 103.
 Set resumption registers (rB, $255) or (rBB, $255) 319 \ Used in section 318.
Set resumption registers (rW, rX) or (rWW, rXX) 320 \ Used in section 318.
Set resumption registers (rY,rZ) or (rYY,rZZ) 321 Used in section 318.
(Set things up so that the results become known when they should 133) Used in
  section 132.
(Set up the first phase of saving 338) Used in section 337.
(Set up the first phase of unsaving 334) Used in section 332.
Simulate an action of the fetch coroutine 288 Used in section 125.
(Simulate later stages of an execution pipeline 135) Used in section 125.
(Simulate the first stage of an execution pipeline 130) Used in section 125.
(Special cases for states in later stages 272, 273, 276, 279, 280, 299, 311, 354, 364, 370)
  Used in section 135.
(Special cases for states in the first stage 266, 310, 326, 360, 363) Used in section 130.
Special cases of instruction dispatch 117, 118, 119, 120, 121, 122, 227, 312, 322, 332, 337,
  347, 355 Used in section 101.
(Start the S-cache filler 225) Used in section 224.
(Start up auxiliary coroutines to compute the page table entry 243) Used in sec-
  tion 237.
Subroutines 14, 19, 21, 25, 28, 31, 33, 35, 43, 46, 56, 63, 73, 91, 93, 95, 97, 157, 159, 170, 172, 174,
  183, 185, 187, 189, 191, 193, 196, 199, 201, 203, 205, 208, 241, 251, 255, 378, 379, 381, 384, 387
  Used in section 3.
\langle Swap cache blocks p and q 197\rangle Used in sections 196 and 205.
\langle \text{Try to get the contents of location } data \neg z.o \text{ in the D-cache } 274 \rangle Used in section 273.
\langle \text{Try to get the contents of location } data \neg z.o \text{ in the I-cache } 300 \rangle Used in section 298.
(Try to put the contents of location write_head → addr into the D-cache 261) Used
  in section 257.
Type definitions 11, 17, 23, 37, 40, 44, 68, 76, 164, 167, 206, 246, 371 Used in sections 3
(Undo data structures set prematurely in the cool block and break 123) Used in
(Update IT-cache usage and check the protection bits 293) Used in sections 292
  and 295.
(Update rG 330) Used in section 329.
(Update the page variables 239) Used in section 329.
\langle \text{Use } cleanup \text{ on the cache blocks for } data \neg z.o, \text{ if any 368} \rangle Used in section 364.
\langle Wait for input data if necessary; set state = 1 if it's there 131 \rangle Used in section 130.
(Wait if there's an unfinished load ahead of us 357) Used in section 356.
(Wait till write buffer is empty 362) Used in sections 361 and 364.
(Wait, if necessary, until the instruction pointer is known 290) Used in section 288.
Write directly from write_head to memory 260 Used in section 257.
```

```
⟨Write the data into the D-cache and set state = 4, if there's a cache hit 262⟩ Used in section 257.
⟨Write the dirty data of c→outbuf and wait for the bus 216⟩ Used in section 215.
⟨Zap the instruction and data caches 359⟩ Used in section 356.
⟨Zap the translation caches 358⟩ Used in section 356.
⟨mmix-pipe.h 5⟩
```

## MMIX-SIM

- 1. Introduction. This program simulates a simplified version of the MMIX computer. Its main goal is to help people create and test MMIX programs for *The Art of Computer Programming* and related publications. It provides only a rudimentary terminal-oriented interface, but it has enough infrastructure to support a cool graphical user interface which could be added by a motivated reader. (Hint, hint.) MMIX is simplified in the following ways:
- There is no pipeline, and there are no caches. Thus, commands like SYNC and SYNCD and PREGO do nothing.
- $\bullet$  No trap interrupts are implemented, except for a few special cases of TRAP that provide rudimentary input-output.
- All instructions take a fixed amount of time, given by the rough estimates stated in the MMIX documentation. For example, MUL takes 10v, LDB takes  $\mu + v$ ; all times are expressed in terms of  $\mu$  and v, "mems" and "oops." The simulated clock increases by  $2^{32}$  for each  $\mu$  and 1 for each v. But the interval counter rI decreases by 1 for each v; and the usage count field of rU may increase by 1 (modulo  $2^{47}$ ) for each instruction.
- 2. To run this simulator, assuming UNIX conventions, you say 'mmix (options) progfile args...', where progfile is an output of the MMIXAL assembler, args... is a sequence of optional command line arguments passed to the simulated program, and (options) is any subset of the following:
- -t<n> Trace each instruction the first n times it is executed. (The notation <n> in this option, and in several other options and interactive commands below, stands for a decimal integer.)
- -e < x > Trace each instruction that raises an arithmetic exception belonging to the given bit pattern. (The notation < x > in this option, and in several other commands below, stands for a hexadecimal integer.) The exception bits are DVWIOUZX as they appear in rA, namely #80 for D (integer divide check), #40 for V (integer overflow), ..., #01 for X (floating inexact). The option -e by itself is equivalent to -eff, tracing all eight exceptions.
- -r Trace details of the register stack. This option shows all the "hidden" loads and stores that occur when octabytes are written from the ring of local registers into memory, or read from memory into that ring. It also shows the full details of SAVE and UNSAVE operations.
- $\bullet$  -1<n> List the source line corresponding to each traced instruction, filling gaps of length n or less. For example, if one instruction came from line 10 of the source

file and the next instruction to be traced came from line 12, line 11 would be shown also, provided that n > 1. If  $\langle n \rangle$  is omitted it is assumed to be 3.

- -s Show statistics of running time with each traced instruction.
- -P Show the program profile (that is, the frequency counts of each instruction that was executed) when the simulation ends.
- -L<n> List the source lines corresponding to each instruction that appears in the program profile, filling gaps of length n or less. This option implies -P. If <n> is omitted it is assumed to be 3.
- $\bullet$  -v Be verbose: Turn on all options. (More precisely, the -v option is shorthand for -t999999999 -e -r -s -110 -L10.)
- -q Be quiet: Cancel all previously specified options.
- -i Go into interactive mode before starting the simulation.
- -I Go into interactive mode when the simulated program halts or pauses for a breakpoint.
- -b<n> Set the buffer size of source lines to max(72, n).
- -c<n> Set the capacity of the local register ring to  $\max(256, n)$ ; this number must be a power of 2.
- -f<filename> Use the named file for standard input to the simulated program. This option should be used whenever the simulator is not being used interactively, because the simulator will not recognize end of file when standard input has been defined in any other way.
- -D<filename> Prepare the named file for use by other simulators, instead of actually doing a simulation.
- -? Print the "Usage" message, which summarizes the command line options.

The author recommends -t2 -1 -L for initial offline debugging.

While the program is being simulated, an *interrupt* signal (usually control-C) will cause the simulator to break and go into interactive mode after tracing the current instruction, even if -i and -I were not specified on the command line.

- 3. In interactive mode, the user is prompted 'mmix>' and a variety of commands can be typed online. Any command line option can be given in response to such a prompt (including the '-' that begins the option), and the following operations are also available:
- Simply typing  $\langle \text{return} \rangle$  or  $n \langle \text{return} \rangle$  to the mmix> prompt causes one MMIX instruction to be executed and traced; then the user is prompted again.
- c continues simulation until the program halts or reaches a breakpoint. (Actually the command is ' $c\langle return \rangle$ ', but we won't bother to mention the  $\langle return \rangle$  in the following description.)
- $\bullet$  q quits (terminates the simulation), after printing the profile (if it was requested) and the final statistics.
- s prints out the current statistics (the clock times and the current instruction location). We have already discussed the -s option on the command line, which causes these statistics to be printed automatically; but a lot of statistics can fill up a lot of file space, so users may prefer to see the statistics only on demand.
- 1<n><t>, g<n><t>, \$<n><t>, \$<n><t>, rA<t>, rB<t>, ..., rZZ<t>, and M<x><t> will show the current value of a local register, global register, dynamically numbered register, special register, or memory location. Here <t> specifies the type of value to be displayed; if <t> is '!', the value will be given in decimal notation; if <t> is '.' it will be given in floating point notation; if <t> is '#' it will be given in hexadecimal, and if <t> is '"' it will be given as a string of eight one-byte characters. Just typing <t> by itself will repeat the most recently shown value, perhaps in another format; for example, the command '110#' will show local register 10 in hexadecimal notation, then the command '!' will show it in decimal and '.' will show it as a floating point number. If <t> is empty, the previous type will be repeated; the default type is decimal. Register rA is equivalent to g22, according to the numbering used in GET and PUT commands.

The '<t>' in any of these commands can also have the form '=<value>', where the value is a decimal or floating point or hexadecimal or string constant. (The syntax rules for floating point constants appear in MMIX-ARITH. A string constant is treated as in the BYTE command of MMIXAL, but padded at the left with zeros if fewer than eight characters are specified.) This assigns a new value before displaying it. For example, '110=.1e3' sets local register 10 equal to 100; 'g250="ABCD",#a' sets global register 250 equal to #000000414243440a; 'M1000=-Inf' sets  $M_8[$ #1000] = #fff000000000000000, the representation of  $-\infty$ . Special registers other than rI cannot be set to values disallowed by PUT. Marginal registers cannot be set to nonzero values.

The command 'rI=250' sets the interval counter to 250; this will cause a break in simulation after 250v have elapsed.

• +<n><t> shows the next n octabytes following the one most recently shown, in format <t>. For example, after '110#' a subsequent '+30' will show 111, 112, ..., 140 in hexadecimal notation. After 'g200=3' a subsequent '+30' will set g201, g202, ..., g230 equal to 3, but a subsequent '+30!' would merely display g201 through g230 in decimal notation. Memory addresses will advance by 8 instead of by 1. If <n> is empty, the default value n=1 is used.

- $\bullet$  @<x> sets the address of the next tetrabyte to be simulated, sort of like a GO command.
- $\bullet$  t<x> says that the instruction in tetrabyte location x should always be traced, regardless of its frequency count.
- $u \le x \ge undoes$  the effect of  $t \le x \ge x$ .
- b[rwx] < x > sets breakpoints at tetrabyte x; here [rwx] stands for any subset of the letters r, w, and/or x, meaning to break when the tetrabyte is read, written, and/or executed. For example, 'bx1000' causes a break in the simulation just after the tetrabyte in #1000 is executed; 'b1000' undoes this breakpoint; 'brwx1000' causes a break just after any simulated instruction loads, stores, or appears in tetrabyte number #1000.
- T, D, P, S changes the "current segment" to either Text\_Segment, Data\_Segment, Pool\_Segment, or Stack\_Segment, respectively, namely to #0, #20000000000000, #40000000000000, or #60000000000000. The current segment, initially #0, is added to all memory addresses in M, @, t, u, and b commands.
- B lists all current breakpoints and tracepoints.
- i<filename> reads a sequence of interactive commands from the specified file, one command per line, ignoring blank lines. This feature can be used to set many breakpoints or to display a number of key registers, etc. Included lines that begin with % or i are ignored; therefore an included file cannot include another file. Included lines that begin with a blank space are reproduced in the standard output, otherwise ignored.
- h (help) reminds the user of the available interactive commands.

- **4.** Rudimentary I/O. Input and output are provided by the following ten primitive system calls:
- Fopen(handle, name, mode). Here handle is a one-byte integer, name is the address of the first byte of a string, and mode is one of the values TextRead, TextWrite, BinaryRead, BinaryWrite, BinaryReadWrite. An Fopen call associates handle with the external file called name and prepares to do input and/or output on that file. It returns 0 if the file was opened successfully; otherwise returns the value -1. If mode is TextWrite, BinaryWrite, or BinaryReadWrite, any previous contents of the named file are discarded. If mode is TextRead or TextWrite, the file consists of "lines" terminated by "newline" characters, and it is said to be a text file; otherwise the file consists of uninterpreted bytes, and it is said to be a binary file.

Text files and binary files are essentially equivalent in cases where this simulator is hosted by an operating system derived from UNIX; in such cases files can be written as text and read as binary or vice versa. But with other operating systems, text files and binary files often have quite different representations, and certain characters with byte codes less than ' $_{\sqcup}$ ' are forbidden in text. Within any MMIX program, the newline character has byte code  $^{\#}$ 0a = 10.

At the beginning of a program three handles have already been opened: The "standard input" file StdIn (handle 0) has mode TextRead, the "standard output" file StdOut (handle 1) has mode TextWrite, and the "standard error" file StdErr (handle 2) also has mode TextWrite. When this simulator is being run interactively, lines of standard input should be typed following a prompt that says 'StdIn>\_\_i', unless the -f option has been used. The standard output and standard error files of the simulated program are intermixed with the output of the simulator itself.

The input/output operations supported by this simulator can perhaps be understood most easily with reference to the standard library stdio that comes with the C language, because the conventions of C have been explained in hundreds of books. If we declare an array FILE \*file[256] and set file[0] = stdin, file[1] = stdout, and file[2] = stderr, then the simulated system call Fopen(handle, name, mode) is essentially equivalent to the C expression

```
(file[handle]? (file[handle] = freopen(name, mode\_string[mode], file[handle])): (file[handle] = fopen(name, mode\_string[mode])))? 0: -1,
```

if we predefine the values  $mode\_string[\texttt{TextRead}] = \texttt{"r"}, \ mode\_string[\texttt{TextWrite}] = \texttt{"w"}, \ mode\_string[\texttt{BinaryRead}] = \texttt{"rb"}, \ mode\_string[\texttt{BinaryWrite}] = \texttt{"wb"}, \ and \ mode\_string[\texttt{BinaryReadWrite}] = \texttt{"wb+"}.$ 

• Fclose(handle). If the given file handle has been opened, it is closed—no longer associated with any file. Again the result is 0 if successful, or -1 if the file was already closed or unclosable. The C equivalent is

$$fclose(file[handle])? -1:0$$

with the additional side effect of setting  $file[handle] = \Lambda$ .

• Fread(handle, buffer, size). The file handle should have been opened with mode TextRead, BinaryRead, or BinaryReadWrite. The next size characters are read into MMIX's memory starting at address buffer. If an error occurs, the value -1-size is returned; otherwise, if the end of file does not intervene, 0 is returned; otherwise the negative value n-size is returned, where n is the number of characters successfully read and stored. The statement

$$fread(buffer, 1, size, file[handle]) - size$$

has the equivalent effect in C, in the absence of file errors.

• Fgets(handle, buffer, size). The file handle should have been opened with mode TextRead, BinaryRead, or BinaryReadWrite. Characters are read into MMIX's memory starting at address buffer, until either size -1 characters have been read and stored or a newline character has been read and stored; the next byte in memory is then set to zero. If an error or end of file occurs before reading is complete, the memory contents are undefined and the value -1 is returned; otherwise the number of characters successfully read and stored is returned. The equivalent in C is

$$fgets(buffer, size, file[handle])$$
?  $strlen(buffer) : -1$ 

if we assume that no null characters were read in; null characters may, however, precede a newline, and they are counted just like other characters.

• Fgetws(handle, buffer, size). This command is the same as Fgets, except that it applies to wyde characters instead of one-byte characters. Up to size-1 wyde characters are read; a wyde newline is #000a. The C version, using conventions of the ISO multibyte string extension (MSE), is approximately

$$fgetws(buffer, size, file[handle])$$
?  $wcslen(buffer) : -1$ 

where buffer now has type **wchar\_t** \*.

• Fwrite(handle, buffer, size). The file handle should have been opened with one of the modes TextWrite, BinaryWrite, or BinaryReadWrite. The next size characters are written from MMIX's memory starting at address buffer. If no error occurs, 0 is returned; otherwise the negative value n-size is returned, where n is the number of characters successfully written. The statement

$$fwrite(buffer, 1, size, file[handle]) - size$$

together with fflush(file[handle]) has the equivalent effect in C.

• Fputs(handle, string). The file handle should have been opened with one of the modes TextWrite, BinaryWrite, or BinaryReadWrite. One-byte characters are written from MMIX's memory to the file, starting at address string, up to but not including the first byte equal to zero. The number of bytes written is returned, or -1 on error. The C version is

$$fputs(string, file[handle]) \ge 0 ? strlen(string) : -1,$$

together with fflush(file[handle]).

• Fputws(handle, string). The file handle should have been opened with one of the modes TextWrite, BinaryWrite, or BinaryReadWrite. Wyde characters are written from MMIX's memory to the file, starting at address string, up to but not including the first wyde equal to zero. The number of wydes written is returned, or -1 on error. The C+MSE version is

$$fputws(string, file[handle]) \ge 0$$
?  $wcslen(string) : -1$ 

together with fflush(file[handle]), where string now has type wchar\_t \*.

• Fseek(handle, offset). The file handle should have been opened with one of the modes BinaryRead, BinaryWrite, or BinaryReadWrite. This operation causes the next input or output operation to begin at offset bytes from the beginning of the file, if offset  $\geq 0$ , or at -offset-1 bytes before the end of the file, if offset < 0. (For example, offset = 0 "rewinds" the file to its very beginning; offset = -1 moves forward all the way to the end.) The result is 0 if successful, or -1 if the stated positioning could not be done. The C version is

```
fseek(file[handle], offset < 0 ? offset + 1 : offset ,  offset < 0 ? SEEK_END : SEEK_SET)? -1: 0.
```

If a file in mode BinaryReadWrite is used for both reading and writing, an Fseek command must be given when switching from input to output or from output to input.

• Ftell(handle). The file handle should have been opened with mode BinaryRead, BinaryWrite, or BinaryReadWrite. This operation returns the current file position, measured in bytes from the beginning, or -1 if an error has occurred. In this case the C function

has exactly the same meaning.

Although these ten operations are quite primitive, they provide the necessary functionality for extremely complex input/output behavior. For example, every function in the stdio library of C, with the exception of the two administrative operations remove and rename, can be implemented as a subroutine in terms of the six basic operations Fopen, Fclose, Fread, Fwrite, Fseek, and Ftell.

Notice that the MMIX function calls are much more consistent than those in the C library. The first argument is always a handle; the second, if present, is always an address; the third, if present, is always a size. The result returned is always nonnegative if the operation was successful, negative if an anomaly arose. These common features make the functions reasonably easy to remember.

5. The ten input/output operations of the previous section are invoked by TRAP commands with  $X=0,\,Y=$  Fopen or Fclose or ... or Ftell, and Z= Handle. If there are two arguments, the second argument is placed in \$255. If there are three arguments, the address of the second is placed in \$255; the second argument is  $M_8[\$255]$  and the third argument is  $M_8[\$255+8]$ . The returned value will be in \$255 when the system call is finished. (See the example below.)

6. The user program starts at symbolic location Main. At this time the global registers are initialized according to the GREG statements in the MMIXAL program, and \$255 is set to the numeric equivalent of Main. Local register \$0 is initially set to the number of command line arguments; and local register \$1 points to the first such argument, which is always a pointer to the program name. Each command line argument is a pointer to a string; the last such pointer is  $M_8[\$0 \ll 3+\$1]$ , and  $M_8[\$0 \ll 3+\$1+8]$  is zero. (Register \$1 will point to an octabyte in Pool\_Segment, and the command line strings will be in that segment too.) Location M[Pool\_Segment] will be the address of the first unused octabyte of the pool segment.

Registers rA, rB, rD, rE, rF, rH, rI, rJ, rM, rP, rQ, and rR are initially zero, and rL = 2.

A subroutine library loaded with the user program might need to initialize itself. If an instruction has been loaded into tetrabyte  $M_4[\#f0]$ , the simulator actually begins execution at #f0 instead of at Main; in this case \$255 holds the location of Main. (The routine at #f0 can pass control to Main without increasing rL, if it starts with the slightly tricky sequence

```
PUT rW, $255; PUT rB, $255; SETML $255, #F700; PUT rX, $255
```

and eventually says RESUME; this RESUME command will restore \$255 and rB. But the user program should *not* really count on the fact that rL is initially 2.)

7. The main program ends when MMIX executes the system call TRAP 0, which is often symbolically written 'TRAP 0, Halt, 0' to make its intention clear. The contents of \$255 at that time are considered to be the value "returned" by the main program, as in the *exit* statement of C; a nonzero value indicates an anomalous exit. All open files are closed when the program ends.

8. Here, for example, is a complete program that copies a text file to the standard output, given the name of the file to be copied. It includes all necessary error checking.

```
* SAMPLE PROGRAM: COPY A GIVEN FILE TO STANDARD OUTPUT
         TS
              $255
t
         IS
              $0
argc
argv
         IS
              $1
         TS
              $2
Buf_Size IS
              1000
         LOC Data_Segment
Buffer
         LOC @+Buf_Size
         GREG @
Arg0
         OCTA 0, TextRead
         OCTA Buffer, Buf_Size
Arg1
         LOC #200
                                 main(argc,argv) {
         CMP t,argc,2
Main
                                 if (argc==2) goto openit
         PBZ t,OpenIt
         GETA t,1F
                                 fputs("Usage: ",stderr)
         TRAP 0, Fputs, StdErr
         LDOU t,argv,0
                                 fputs(argv[0],stderr)
         TRAP 0, Fputs, StdErr
         GETA t,2F
                                 fputs(" filename\n",stderr)
         TRAP 0, Fputs, StdErr
Quit
         NEG t,0,1
                                 quit: exit(-1)
         TRAP 0, Halt, 0
1H
         BYTE "Usage: ",0
         LOC (0+3)&-4
                                 align to tetrabyte
2H
         BYTE " filename", #a, 0
         LDOU s,argv,8
OpenIt
                                 openit: s=argv[1]
         STOU s, ArgO
                                 fopen(argv[1], "r", file[3])
         LDA t, ArgO
         TRAP 0, Fopen, 3
         PBNN t,CopyIt
                                 if (no error) goto copyit
                                 fputs("Can't open file ",stderr)
         GETA t,1F
         TRAP 0, Fputs, StdErr
         SET t,s
                                 fputs(argv[1],stderr)
         TRAP 0, Fputs, StdErr
                                 fputs("!\n",stderr)
         GETA t,2F
         JMP Quit
                                 goto quit
1H
         BYTE "Can't open file ",0
         LOC (0+3)\&-4
                                 align to tetrabyte
2H
         BYTE "!", #a, 0
CopyIt
         LDA t, Arg1
                                 copyit:
         TRAP 0, Fread, 3
                                 items=fread(buffer,1,buf_size,file[3])
```

BN t, EndIt if (items < buf\_size) goto endit

LDA t,Arg1 items=fwrite(buffer,1,buf\_size,stdout)

TRAP 0, Fwrite, StdOut

PBNN t,CopyIt if (items >= buf\_size) goto copyit

Trouble GETA t,1F trouble: fputs("Trouble w...!",stderr)

JMP Quit goto quit

1H BYTE "Trouble writing StdOut!",#a,0

EndIt INCL t,Buf\_Size

BN t,ReadErr if (ferror(file[3])) goto readerr

STO t,Arg1+8

LDA t,Arg1 n=fwrite(buffer,1,items,stdout)

TRAP 0, Fwrite, StdOut

BN t, Trouble if (n < items) goto trouble

TRAP 0, Halt, 0 exit(0)

ReadErr GETA t,1F readerr: fputs("Trouble r...!",stderr)

JMP Quit goto quit }

1H BYTE "Trouble reading!", #a,0

**9. Basics.** To get started, we define a type that provides semantic sugar.

```
\langle \text{ Type declarations 9} \rangle \equiv \\ \textbf{typedef enum } \{\\ false, true \\ \} \textbf{ bool}; \\ \text{See also sections 10, 16, 38, 39, 54, 55, 59, 64, and 135.} \\
```

This code is used in section 141.

10. This program for the 64-bit MMIX architecture is based on 32-bit integer arithmetic, because nearly every computer available to the author at the time of writing (1999) was limited in that way. It uses subroutines from the MMIX-ARITH module, assuming only that type **tetra** represents unsigned 32-bit integers. The definition of **tetra** given here should be changed, if necessary, to agree with the definition in that module.

```
⟨ Type declarations 9⟩ +≡
  typedef unsigned int tetra;
  /* for systems conforming to the LP-64 data model */
  typedef struct {
    tetra h, l;
  } octa; /* two tetrabytes make one octabyte */
  typedef unsigned char byte; /* a monobyte */
```

11. We declare subroutines twice, once with a prototype and once with the oldstyle C conventions. The following hack makes this work with new compilers as well as the old standbys.

```
\langle \text{ Preprocessor macros } 11 \rangle \equiv
#ifdef __STDC__
#define ARGS(list) list
#else
\#define ARGS(list) ()
#endif
See also sections 43 and 46.
This code is used in section 141.
       \langle Subroutines 12\rangle \equiv
  void print_hex ARGS((octa));
  void print_hex(o)
        octa o;
     if (o.h) printf("%x%08x", o.h, o.l);
     else printf("\%x", o.l);
   }
See also sections 13, 15, 17, 20, 26, 27, 42, 45, 47, 50, 82, 83, 91, 114, 117, 120, 137, 140, 143, 148,
     154, 160, 162, 165, and 166.
This code is used in section 141.
```

\_\_STDC\_\_, Standard C.

13. Most of the subroutines in MMIX-ARITH return an octabyte as a function of two octabytes; for example, oplus(y, z) returns the sum of octabytes y and z. Division inputs the high half of a dividend in the global variable aux and returns the remainder in aux.

```
\langle Subroutines 12\rangle + \equiv
                                /* zero\_octa.h = zero\_octa.l = 0 */
  extern octa zero_octa;
  extern octa neq_one;
                               /* neq\_one.h = neq\_one.l = -1 */
  extern octa aux, val;
                               /* auxiliary data */
  extern bool overflow;
                               /* flag set by signed multiplication and division */
                               /* bits set by floating point operations */
  extern int exceptions;
                               /* the current rounding mode */
  extern int cur_round;
                                  /* where a scanned constant ended */
  extern char *next_char;
  extern octa oplus ARGS((octa y, octa z));
                                                    /* unsigned y + z */
                                                       /* unsigned y-z*/
  extern octa ominus \ ARGS((octa \ y, octa \ z));
  extern octa incr ARGS((octa y, int delta));
                                                      /* unsigned y + \delta (\delta is signed) */
                                                     /* y \wedge z */
  extern octa oand ARGS((octa y, octa z));
  extern octa shift\_left \ ARGS((octa \ y, int \ s));
                                                      /* y \ll s, 0 \le s \le 64 */
  extern octa shift_right ARGS((octa y, int s, int u)); /* y \gg s, signed if \neg u */
  extern octa omult \ ARGS((octa \ y, octa \ z));
                                                     /* unsigned (aux, x) = y \times z */
  extern octa signed\_omult \ ARGS((octa \ y, octa \ z));
                                                           /* signed x = y \times z */
  extern octa odiv \ ARGS((octa \ x, octa \ y, octa \ z));
    /* unsigned (x, y)/z; aux = (x, y) \mod z */
  extern octa signed\_odiv \ ARGS((octa \ y, octa \ z));
                                                           /* signed x = y/z */
  extern int count\_bits ARGS((tetra z)):
                                              /* x = \nu(z) */
  extern tetra byte\_diff \ ARGS((tetra \ y, tetra \ z));
                                                           /* half of BDIF */
  extern tetra wyde\_diff \ ARGS((tetra \ y, tetra \ z));
                                                           /* half of WDIF */
  extern octa bool_mult ARGS((octa y, octa z, bool xor));
                                                                    /* MOR or MXOR */
                                              /* load short float */
  extern octa load\_sf ARGS((tetra z));
  extern tetra store\_sf ARGS((octa x));
                                              /* store short float */
  extern octa fplus ARGS((octa y, octa z));
                                                    /* floating point x = y \oplus z */
                                                    /* floating point x = y \otimes z */
  extern octa fmult \ ARGS((octa \ y, octa \ z));
                                                    /* floating point x = y \oslash z */
  extern octa fdivide \ ARGS((octa\ y, octa\ z));
  extern octa froot ARGS((octa, int)); /* floating point x = \sqrt{z} */
  extern octa fremstep ARGS((octa\ y, octa\ z, int\ delta));
    /* floating point x \operatorname{rem} z = y \operatorname{rem} z */
  extern octa fintegerize ARGS((octa z, int mode)); /* floating point x = \text{round}(z) */
  extern int fcomp \ ARGS((octa \ y, octa \ z));
     /* -1, 0, 1, \text{ or } 2 \text{ if } y < z, y = z, y > z, y \parallel z */
  extern int fepscomp ARGS((octa y, octa z, octa eps, int sim));
    /* x = sim? [y \sim z(\epsilon)] : [y \approx z(\epsilon)] */
  extern octa floatit ARGS((octa z, int mode, int unsgnd, int shrt));
     /* fix to float */
  extern octa fixit ARGS((octa z, int mode));
                                                      /* float to fix */
  extern void print_float ARGS((octa z));
                                                 /* print octabyte as floating decimal */
  extern int scan\_const \ ARGS((char *buf));
    /* val = floating or integer constant; returns the type */
```

14. Here's a quick check to see if arithmetic is in trouble.

```
#define panic(m) { fprintf(stderr, "Panic: _\%s! \n", m); exit(-2); } 
 <math>\langle Initialize everything 14 \rangle \equiv  if (shift\_left(neg\_one, 1).h \neq {}^\#fffffff) panic("Incorrect_implementation_iof_itype_itetra"); 
 See also sections 18, 24, 32, 41, 77, and 147. 
 This code is used in section 141.
```

```
ARGS = macro (), §11.
aux: octa, MMIX-ARITH §4.
bool = enum, \S 9.
bool_mult: octa (),
  MMIX-ARITH §29.
byte_diff: tetra (),
 MMIX-ARITH §27.
count_bits: int (),
  MMIX-ARITH §26.
cur_round: int,
  MMIX-ARITH §30.
exceptions: int,
  MMIX-ARITH §32.
exit: void (), <stdlib.h>.
fcomp: int (), MMIX-ARITH §85.
fdivide: octa (),
  MMIX-ARITH §44.
fepscomp: int (),
  MMIX-ARITH §50.
fintegerize: octa (),
  MMIX-ARITH §86.
fixit: octa (), mmix-arith §88.
floatit: octa (),
  MMIX-ARITH §89.
fmult: octa (),
```

```
{\rm MMIX\text{-}ARITH}~\S41.
fplus: octa (),
  \text{MMIX-ARITH } \S 46.
fprintf: int (), <stdio.h>.
fremstep: octa (),
  MMIX-ARITH §93.
froot: octa (),
 MMIX-ARITH §91.
h: tetra, §10.
incr: octa (), MMIX-ARITH §6.
l: tetra, §10.
load_sf: octa (),
  MMIX-ARITH §39.
neq_one: octa, MMIX-ARITH §4.
next\_char: char *,
  MMIX-ARITH §69.
oand: octa (),
  MMIX-ARITH \S 25.
octa = struct, \S 10.
odiv: octa (), MMIX-ARITH §13.
ominus: octa (),
  MMIX-ARITH §5.
omult: octa (),
  MMIX-ARITH §8.
```

```
oplus: octa (), MMIX-ARITH §5.
overflow: bool,
  MMIX-ARITH §4.
print_float: void (),
 MMIX-ARITH §54.
scan_const: int (),
  MMIX-ARITH §68.
shift_left: octa (),
  MMIX-ARITH §7.
shift_right: octa (),
  MMIX-ARITH §7.
signed_odiv: octa (),
  MMIX-ARITH §24.
signed_omult: octa (),
  MMIX-ARITH §12.
stderr: \mathbf{FILE} *, < stdio.h >.
store_sf: tetra (),
  MMIX-ARITH §40.
tetra = unsigned int, §10.
val: octa, MMIX-ARITH §69.
wyde_diff: tetra (),
  MMIX-ARITH §28.
zero_octa: octa,
  MMIX-ARITH §4.
```

**15.** Binary-to-decimal conversion is used when we want to see an octabyte as a signed integer. The identity  $\lfloor (an+b)/10 \rfloor = \lfloor a/10 \rfloor n + \lfloor ((a \bmod 10)n + b)/10 \rfloor$  is helpful here.

```
#define sign_bit ((unsigned) #80000000)
\langle Subroutines 12\rangle + \equiv
  void print_int ARGS((octa));
  void print_int(o)
       octa o;
     register tetra hi = o.h, lo = o.l, r, t;
     register int j;
    char dig[20];
    if (lo \equiv 0 \land hi \equiv 0) printf("0");
    else {
       if (hi & sign_bit) {
          printf ("-");
          if (lo \equiv 0) hi = -hi;
          else lo = -lo, hi = \sim hi;
       for (j = 0; hi; j++) { /* 64-bit division by 10 */
          r = ((hi \% 10) \ll 16) + (lo \gg 16);
          hi = hi/10;
          t = ((r \% 10) \ll 16) + (lo \& #ffff);
          lo = ((r/10) \ll 16) + (t/10);
          dig[j] = t \% 10;
       for (; lo; j++) {
          dig[j] = lo \% 10;
          lo = lo/10;
       for (j--; j \ge 0; j--) printf("%c", dig[j] + '0');
    }
  }
```

16. Simulated memory. Chunks of simulated memory, 2048 bytes each, are kept in a tree structure organized as a treap, following ideas of Vuillemin, Aragon, and Seidel [Communications of the ACM 23 (1980), 229–239; IEEE Symp. on Foundations of Computer Science 30 (1989), 540–546]. Each node of the treap has two keys: One, called loc, is the base address of 512 simulated tetrabytes; it follows the conventions of an ordinary binary search tree, with all locations in the left subtree less than the loc of a node and all locations in the right subtree greater than that loc. The other, called stamp, can be thought of as the time the node was inserted into the tree; all subnodes of a given node have a larger stamp. By assigning time stamps at random, we maintain a tree structure that almost always is fairly well balanced.

Each simulated tetrabyte has an associated frequency count and source file reference.

```
\langle \text{Type declarations } 9 \rangle + \equiv
  typedef struct {
                   /* the tetrabyte of simulated memory */
    tetra tet:
                  /* the number of times it was obeyed as an instruction */
    unsigned char bkpt;
                              /* breakpoint information for this tetrabyte */
                                /* source file number, if known */
    unsigned char file_no:
    unsigned short line_no;
                                  /* source line number, if known */
  } mem_tetra:
  typedef struct mem_node_struct {
                  /* location of the first of 512 simulated tetrabytes */
    octa loc:
                     /* time stamp for treap balancing */
    struct mem_node_struct *left, *right;
                                                  /* pointers to subtrees */
    mem_tetra dat[512];
                             /* the chunk of simulated tetrabytes */
  } mem_node;
```

17. The *stamp* value is actually only pseudorandom, based on the idea of Fibonacci hashing [see *Sorting and Searching*, Section 6.4]. This is good enough for our purposes, and it guarantees that no two stamps will be identical.

```
 \begin{array}{l} \langle \, {\rm Subroutines} \,\, 12 \, \rangle \, + \equiv \\ & \, {\rm mem\_node} \,\, *new\_mem \,\, {\rm ARGS}(({\bf void})); \\ & \, {\rm mem\_node} \,\, *new\_mem() \\ \{ & \, {\rm register} \,\, {\rm mem\_node} \,\, *p; \\ & \, p = ({\rm mem\_node} \,\, *) \,\, calloc(1, {\rm sizeof}({\rm mem\_node})); \\ & \, {\rm if} \,\, (\neg p) \,\,\, panic("{\rm Can't\_allocate\_any\_more\_memory"}); \\ & \, priority \,\, + = \,\, priority; \\ & \, priority \,\, + = \,\, \#9{\rm e}3779{\rm b}9; \qquad /* \,\, \lfloor 2^{32}(\phi - 1) \rfloor \,\, */ \\ & \, {\rm return} \,\, p; \\ \} \end{array}
```

18. Initially we start with a chunk for the pool segment, since the simulator will be putting command line information there before it runs the program.

```
⟨Initialize everything 14⟩ +≡
    mem_root = new_mem();
    mem_root-loc.h = #40000000;
    last_mem = mem_root;

19. ⟨Global variables 19⟩ ≡
    tetra priority = 314159265; /* pseudorandom time stamp counter */
    mem_node *mem_root; /* root of the treap */
    mem_node *last_mem; /* the memory node most recently read or written */
    octa sclock; /* simulated clock */
See also sections 25, 31, 40, 48, 52, 56, 61, 65, 76, 110, 113, 121, 129, 139, 144, and 151.
This code is used in section 141.
```

**20.** The *mem\_find* routine finds a given tetrabyte in the simulated memory, inserting a new node into the treap if necessary.

```
 \langle \text{Subroutines 12} \rangle +\equiv \\ \mathbf{mem\_tetra} * mem\_find \ \mathsf{ARGS}((\mathbf{octa})); \\ \mathbf{mem\_tetra} * mem\_find (addr) \\ \mathbf{octa} \ addr; \\  \{ \\ \mathbf{octa} \ key; \\ \mathbf{register} \ \mathbf{int} \ offset; \\ \mathbf{register} \ \mathbf{mem\_node} * p = last\_mem; \\ key.h = addr.h; \\ key.l = addr.l \ \& \ ^\# \mathsf{fffff800}; \\ offset = addr.l \ \& \ ^\# \mathsf{ffc}; \\ \mathbf{if} \ (p \text{-}loc.l \neq key.l \lor p \text{-}loc.h \neq key.h) \\  \langle \ \mathsf{Search} \ \mathbf{for} \ key \ \mathbf{in} \ \mathbf{thetap, setting} \ last\_mem \ \mathbf{and} \ p \ \mathbf{to} \ \mathbf{its} \ \mathbf{location} \ 21 \ \rangle; \\ \mathbf{return} \ \& p \text{-}dat [offset \gg 2]; \\ \}
```

```
{ Search for key in the treap, setting last_mem and p to its location 21 } = { register mem_node **q; } for (p = mem_root; p; ) { if (key.l = p^*loc.l \wedge key.h = p^*loc.h) goto found; if ((key.l < p^*loc.l \wedge key.h \le p^*loc.h) \wedge key.h < p^*loc.h) p = p^*left; else p = p^*right; } for (p = mem_root, q = &mem_root; p \wedge p^*stamp < priority; p = *q) { if ((key.l < p^*loc.l \wedge key.h \le p^*loc.h) \wedge key.h < p^*loc.h) q = &p^*left; else q = &p^*right; } *q = new_mem(); (*q)^*loc = key; \wedge Fix up the subtrees of *q 22 \wedge; p = *q; found: last_mem = p; }
```

This code is used in section 20.

**22.** At this point we want to split the binary search tree p into two parts based on the given key, forming the left and right subtrees of the new node q. The effect will be as if key had been inserted before all of p's nodes.

```
 \langle \text{ Fix up the subtrees of } *q \ 22 \ \rangle \equiv \\ \{ \\ \text{ register mem\_node } **l = \&(*q) \neg left, \ **r = \&(*q) \neg right; \\ \text{while } (p) \ \{ \\ \text{if } ((key.l
```

This code is used in section 21.

23. Loading an object file. To get the user's program into memory, we read in an MMIX object, using modifications of the routines in the utility program MMOtype. Complete details of mmo format appear in the program for MMIXAL; a reader who hopes to understand this section ought to at least skim that documentation. Here we need to define only the basic constants used for interpretation.

```
#define mm #98
                       /* the escape code of mmo format */
#define lop_quote
                           /* the quotation lopcode */
                 #1
                        /* the location lopcode */
#define lop_loc
#define lop_skip
                         /* the skip lopcode */
                         /* the octabyte-fix lopcode */
#define lop_fixo
                  #3
                         /* the relative-fix lopcode */
#define lop_fixr
#define lop_fixrx
                          /* extended relative-fix lopcode */
                  #6
#define lop_file
                         /* the file name lopcode */
#define lop_line
                  #7
                         /* the file position lopcode */
                  #8
                         /* the special hook lopcode */
#define lop_spec
                  #9
                         /* the preamble lopcode */
#define lop_pre
                  #a
                         /* the postamble lopcode */
#define lop_post
                   #b
#define lop_stab
                          /* the symbol table lopcode */
#define lop_end #c
                         /* the end-it-all lopcode */
```

**24.** We do not load the symbol table. (A more ambitious simulator could implement MMIXAL-style expressions for interactive debugging, but such enhancements are left to the interested reader.)

```
\langle Initialize everything 14\rangle + \equiv
  mmo\_file = fopen(mmo\_file\_name, "rb");
  if (\neg mmo\_file) {
     register char *alt\_name = (char *) calloc(strlen(mmo\_file\_name) + 5, sizeof(char));
     if (¬alt_name) panic("Can't_|allocate_|file_|name_|buffer");
     sprintf(alt_name, "%s.mmo", mmo_file_name);
     mmo\_file = fopen(alt\_name, "rb");
     if (\neg mmo\_file) {
       fprintf(stderr, "Can't_lopen_lthe_lobject_lfile_l%s_lor_l%s!\n", mmo\_file\_name,
       exit(-3);
     free(alt_name);
  byte\_count = 0;
25. \langle Global variables 19\rangle + \equiv
  FILE *mmo\_file;
                         /* the input file */
                      /* have we encountered lop_post? */
  int postamble;
  int byte_count;
                      /* index of the next-to-be-read byte */
  byte buf[4];
                    /* the most recently read bytes */
                   /* the two least significant bytes */
  int yzbytes;
  int delta:
                 /* difference for relative fixup */
  tetra tet;
                 /* buf bytes packed big-endianwise */
```

**26.** The tetrabytes of an mmo file are stored in friendly big-endian fashion, but this program is supposed to work also on computers that are little-endian. Therefore we read four successive bytes and pack them into a tetrabyte, instead of reading a single tetrabyte.

```
#define mmo_err
             fprintf(stderr, "Bad, object, file!, (Try, running, MMOtype.) \n");
             exit(-4);
\langle Subroutines 12\rangle + \equiv
  void read_tet ARGS((void));
  void read_tet()
     if (fread(buf, 1, 4, mmo\_file) \neq 4) mmo\_err;
     yzbytes = (buf[2] \ll 8) + buf[3];
     tet = (((buf[0] \ll 8) + buf[1]) \ll 16) + yzbytes;
27. \langle Subroutines 12\rangle + \equiv
  byte read_byte ARGS((void));
  byte read_byte()
     register byte b;
     if (\neg bute\_count) read_tet():
     b = buf[byte\_count];
     byte\_count = (byte\_count + 1) \& 3;
     return b:
  }
28. \langle Load the preamble 28\rangle \equiv
                  /* read the first tetrabyte of input */
  if (buf[0] \neq mm \lor buf[1] \neq lop\_pre) mmo\_err;
  if (ybyte \neq 1) mmo\_err;
  if (zbyte \equiv 0) obj\_time = #ffffffff;
  else {
     j = zbyte - 1;
     read_tet(); obj_time = tet; /* file creation time */
     for ( ; j > 0; j--) read_tet();
This code is used in section 32.
```

```
ARGS = macro(), \S 11.
                                 fread: size_t (), <stdio.h>.
                                                                   sprintf: int (), <stdio.h>.
byte = unsigned char, §10.
                                 free: void (), <stdlib.h>.
                                                                   stderr: FILE *, <stdio.h>.
calloc: void *(), <stdlib.h>.
                                 j: register int, §62.
                                                                   strlen: size_t (), <string.h>.
exit: void (), <stdlib.h>.
                                 mmo\_file\_name = macro, §142.
                                                                   tetra = unsigned int, §10.
FILE, <stdio.h>.
                                 obj_time: tetra, §31.
                                                                   ybyte = macro, \S 33.
fopen: FILE *(), <stdio.h>.
                                 panic = macro(), §14.
                                                                   zbyte = macro, \S 33.
fprintf: int (), <stdio.h>.
```

```
29.
      \langle \text{Load the next item 29} \rangle \equiv
     read\_tet();
  loop: if (buf[0] \equiv mm)
       switch (buf[1]) {
        case lop\_quote: if (yzbytes \neq 1) mmo\_err;
           read_tet(); break;
        (Cases for lopcodes in the main loop 33)
        case lop\_post: postamble = 1;
           if (ybyte \lor zbyte < 32) \ mmo\_err;
           continue:
        default: mmo\_err;
     \langle \text{Load } tet \text{ as a normal item } 30 \rangle;
This code is used in section 32.
30. In a normal situation, the newly read tetrabyte is simply supposed to be loaded
into the current location. We load not only the current location but also the current
file position, if cur_line is nonzero and cur_loc belongs to segment 0.
#define mmo\_load(loc, val) ll = mem\_find(loc), ll \rightarrow tet \oplus = val
\langle \text{Load } tet \text{ as a normal item } 30 \rangle \equiv
     mmo\_load(cur\_loc, tet);
     if (cur_line) {
        ll \rightarrow file\_no = cur\_file;
        ll \rightarrow line\_no = cur\_line;
        cur\_line++;
     cur\_loc = incr(cur\_loc, 4); cur\_loc.l \&= -4;
This code is used in section 29.
31. \langle Global variables 19 \rangle + \equiv
  octa cur_loc;
                      /* the current location */
  int cur\_file = -1;
                          /* the most recently selected file number */
                      /* the current position in cur_file, if nonzero */
  int cur_line;
                  /* an octabyte of temporary interest */
  octa tmp;
  tetra obj_time;
                         /* when the object file was created */
      \langle Initialize everything 14\rangle + \equiv
  cur\_loc.h = cur\_loc.l = 0;
  cur\_file = -1;
  cur\_line = 0;
  (Load the preamble 28);
  do (Load the next item 29) while (\neg postamble);
  (Load the postamble 37);
  fclose(mmo_file);
  cur\_line = 0;
```

**33.** We have already implemented *lop\_quote*, which falls through to the normal case after reading an extra tetrabyte. Now let's consider the other lopcodes in turn.

```
#define ybyte buf [2] /* the next-to-least significant byte */
#define zbyte buf [3] /* the least significant byte */
⟨ Cases for lopcodes in the main loop 33⟩ ≡
case lop_loc: if (zbyte ≡ 2) {
    j = ybyte; read_tet(); cur_loc.h = (j ≪ 24) + tet;
} else if (zbyte ≡ 1) cur_loc.h = ybyte ≪ 24;
else mmo_err;
read_tet(); cur_loc.l = tet;
continue;
case lop_skip: cur_loc = incr(cur_loc, yzbytes); continue;
See also sections 34, 35, and 36.
This code is used in section 29.
```

**34.** Fixups load information out of order, when future references have been resolved. The current file name and line number are not considered relevant.

```
\langle Cases for lopcodes in the main loop 33\rangle + \equiv
case lop\_fixo: if (zbyte \equiv 2) {
     j = ybyte; read_tet(); tmp.h = (j \ll 24) + tet;
   } else if (zbyte \equiv 1) tmp.h = ybyte \ll 24;
  else mmo_err;
  read\_tet(); tmp.l = tet;
  mmo\_load(tmp, cur\_loc.h);
  mmo\_load(incr(tmp, 4), cur\_loc.l);
  continue:
case lop\_fixr: delta = yzbytes;
  goto fixr:
case lop_fixrx: j = yzbytes; if (j \neq 16 \land j \neq 24) mmo_err;
  read_tet();
  delta = tet:
  if (delta & #fe000000) mmo_err:
fixr: tmp = incr(cur\_loc, -(delta \ge #1000000)? (delta \& #ffffff) - (1 \ll j) : delta) \ll 2);
  mmo\_load(tmp, delta);
  continue:
```

```
buf: byte [], §25.
                                       §62.
                                                                          mm = {}^{\#}98, \S 23.
                                     lop\_fixo = {}^{\#}3, \S 23.
delta: int, \S 25.
                                                                          mmo\_err = macro, \S 26.
                                     lop\_fixr = \#4, \S23.
fclose: int (), <stdio.h>.
                                                                          mmo_file: FILE *, §25.
                                     lop_fixrx = #5, \S 23.
file_no: unsigned char, §16.
                                                                          octa = struct, \S 10.
h: tetra, §10.
                                     lop\_loc = #1, \S 23.
                                                                          postamble: int, §25.
incr: octa (), MMIX-ARITH §6.
                                     lop\_post = \#a, \S 23.
                                                                          read_tet: void (), §26.
                                     lop\_quote = #0, \S 23.
j: register int, §62.
                                                                          tet: tetra, §25.
l: tetra, §10.
                                     lop_{\bullet}skip = \#2, \S23.
                                                                          tet: tetra, §16.
line_no: unsigned short, §16.
                                     mem\_find: mem\_tetra *(),
                                                                          tetra = unsigned int, §10.
ll: register mem_tetra *,
                                       §20.
                                                                          yzbytes: int, §25.
```

}

35. The space for file names isn't allocated until we are sure we need it.  $\langle$  Cases for lopcodes in the main loop 33 $\rangle + \equiv$ **case** lop\_file: **if** (file\_info[ybyte].name) { **if** (zbute) mmo\_err:  $cur\_file = ybyte$ ; } else { if  $(\neg zbyte)$  mmo\_err;  $file\_info[ybyte].name = (\mathbf{char} *) \ calloc(4 * zbyte + 1, 1);$ **if**  $(\neg file\_info[ybyte].name)$  {  $fprintf(stderr, "No_{\parallel}room_{\parallel}to_{\parallel}store_{\parallel}the_{\parallel}file_{\parallel}name! \n"); exit(-5);$  $cur\_file = ybyte;$ for  $(j = zbyte, p = file\_info[ybyte].name; j > 0; j--, p+= 4)$  { read\_tet(); \*p = buf[0]; \*(p+1) = buf[1]; \*(p+2) = buf[2]; \*(p+3) = buf[3];}  $cur\_line = 0$ ; **continue**; case  $lop\_line$ : if  $(cur\_file < 0)$   $mmo\_err$ :  $cur\_line = yzbytes$ ; continue; Special bytes are ignored (at least for now).  $\langle$  Cases for lopcodes in the main loop 33 $\rangle + \equiv$ case  $lop\_spec$ : while (1) { read\_tet(); if  $(buf[0] \equiv mm)$  { if  $(buf[1] \neq lop\_quote \lor yzbytes \neq 1)$  goto loop; /\* end of special data \*/  $read\_tet();$ 

**37.** Since a chunk of memory holds 512 tetrabytes, the *ll* pointer in the following loop stays in the same chunk (namely, the first chunk of segment 3, also known as Stack\_Segment).

```
\langle \text{Load the postamble 37} \rangle \equiv
   aux.h = {}^{\#}60000000; \ aux.l = {}^{\#}18;
   ll = mem\_find(aux);
  (ll-1) \rightarrow tet = 2;
                            /* this will ultimately set rL = 2 */
  (ll-5) \rightarrow tet = argc;
                                /* and \$0 = argc */
  (ll-4) \rightarrow tet = 400000000;
  (ll - 3) \rightarrow tet = {}^{\#}8;
                             /* and $1 = Pool_Segment + 8 */
  G = zbyte; L = 0;
  for (j = G + G; j < 256 + 256; j++, ll++, aux.l+=4) read_tet(), ll \rightarrow tet = tet;
   inst\_ptr.h = (ll-2) \rightarrow tet, inst\_ptr.l = (ll-1) \rightarrow tet; /* Main */
  (ll + 2 * 12) \rightarrow tet = G \ll 24;
  q[255] = incr(aux, 12 * 8);
                                     /* we will UNSAVE from here, to get going */
This code is used in section 32.
```

**38.** Loading and printing source lines. The loaded program generally contains cross references to the lines of symbolic source files, so that the context of each instruction can be understood. The following sections of this program make such information available when it is desired.

Source file data is kept in a **file\_node** structure:

```
⟨ Type declarations 9⟩ +≡
typedef struct {
   char *name; /* name of source file */
   int line_count; /* number of lines in the file */
   long *map; /* pointer to map of file positions */
} file_node;
```

**39.** In partial preparation for the day when source files are in Unicode, we define a type **Char** for the source characters.

```
⟨ Type declarations 9⟩ +≡
   typedef char Char; /* bytes that will become wydes some day */
40. ⟨ Global variables 19⟩ +≡
   file_node file_info[256]; /* data about each source file */
   int buf_size; /* size of buffer for source lines */
   Char *buffer;
```

41. As in MMIXAL, we prefer source lines of length 72 characters or less, but the user is allowed to increase the limit. (Longer lines will silently be truncated to the buffer size when the simulator lists them.)

```
 \begin{split} &\langle \text{ Initialize everything } 14 \rangle + \equiv \\ & \text{ if } (\textit{buf\_size} < 72) \ \textit{buf\_size} = 72; \\ & \textit{buffer} = (\mathbf{Char} *) \ \textit{calloc}(\textit{buf\_size} + 1, \mathbf{sizeof}(\mathbf{Char})); \\ & \text{ if } (\neg \textit{buffer}) \ \textit{panic}(\texttt{"Can't\_allocate\_source\_line\_buffer"}); \end{split}
```

```
argc: int, §141.
                                    j: register int, §62.
                                                                        mm = \#98, \S23.
aux: octa, MMIX-ARITH §4.
                                    l: tetra, §10.
                                                                        mmo_{\bullet}err = macro, \S 26.
buf: byte [], §25.
                                    L: register int, §75.
                                                                        p: register char *, \S62.
calloc: void *(), <stdlib.h>.
                                    ll: register mem_tetra *,
                                                                        panic = macro(), \S 14.
cur_file: int, §31.
                                      §62.
                                                                        read_tet: void (), §26.
                                    loop: label, §29.
cur_line: int, §31.
                                                                        rL = 20, \S 55.
                                    lop_{file} = #6, \S 23.
                                                                        stderr: FILE *, <stdio.h>.
exit: void (), <stdlib.h>.
fprintf: int (), <stdio.h>.
                                    lop\_line = #7, \S 23.
                                                                        tet: tetra, §16.
g: octa [], §76.
                                    lop\_quote = \#0, \S 23.
                                                                        tet: tetra, §25.
G: register int, §75.
                                    lop\_spec = \#8, \S23.
                                                                        ybyte = macro, \S 33.
h: tetra, §10.
                                    mem_find: mem_tetra *(),
                                                                        yzbytes: int, §25.
incr: octa (), MMIX-ARITH §6.
                                      §20.
                                                                        zbyte = macro, \S 33.
inst\_ptr: octa, §61.
```

**42.** The first time we are called upon to list a line from a given source file, we make a map of starting locations for each line. Source files should contain at most 65535 lines. We assume that they contain no null characters.

```
\langle Subroutines 12\rangle + \equiv
  void make_map ARGS((void));
  void make_map()
     long map [65536];
     register int k, l;
     register long *p;
      (Check if the source file has been modified 44);
     for (l = 1; l < 65536 \land \neg feof(src\_file); l++) {
        map[l] = ftell(src\_file);
     loop: if (¬fgets(buffer, buf_size, src_file)) break;
        if (buffer[strlen(buffer) - 1] \neq '\n') goto loop;
     file\_info[cur\_file].line\_count = l;
     file\_info[cur\_file].map = p = (long *) calloc(l, sizeof(long));
     if (\neg p) panic("No_{\square}room_{\square}for_{\square}a_{\square}source-line_{\square}map");
     for (k = 1; k < l; k++) p[k] = map[k];
   }
```

43. We want to warn the user if the source file has changed since the object file was written. The standard C library doesn't provide the information we need; so we use the UNIX system function stat, in hopes that other operating systems provide a similar way to do the job.

45. Source lines are listed by the *print\_line* routine, preceded by 12 characters

containing the line number. If a file error occurs, nothing is printed—not even an error message; the absence of listed data is itself a message.

This code is used in section 42.

```
char buf [11];

if (k ≥ file_info[cur_file].line_count) return;

if (fseek(src_file, file_info[cur_file].map[k], SEEK_SET) ≠ 0) return;

if (¬fgets(buffer, buf_size, src_file)) return;

sprintf(buf, "%d: □□□□", k);

printf("line_%.6s□%s", buf, buffer);

if (buffer[strlen(buffer) - 1] ≠ '\n') printf("\n");

line_shown = true;
}

46. ⟨Preprocessor macros 11⟩ +≡

#ifndef SEEK_SET

#define SEEK_SET

#define SEEK_SET 0 /* code for setting the file pointer to a given offset */

#endif
```

47. The *show\_line* routine is called when we want to output line  $cur\_line$  of source file number  $cur\_file$ , assuming that  $cur\_line \neq 0$ . Its job is primarily to maintain continuity, by opening or reopening the  $src\_file$  if the source file changes, and by connecting the previously output lines to the new one. Sometimes no output is necessary, because the desired line has already been printed.

```
 \begin{array}{l} \langle \, {\rm Subroutines} \,\, 12 \, \rangle \, + \equiv \\ \mbox{void} \,\, show\_line \,\, {\rm ARGS}(({\bf void})); \\ \mbox{void} \,\, show\_line() \\ \{ \mbox{ } \mbox{register int } k; \\ \mbox{if } (shown\_file \neq cur\_file) \,\,\, \langle \, {\rm Prepare } \,\, {\rm to } \,\, {\rm lines } \,\, {\rm from } \,\, {\rm a } \,\, {\rm new } \,\, {\rm source } \,\, {\rm file} \,\,\, 49 \, \rangle \\ \mbox{else } \mbox{if } (shown\_line \equiv cur\_line) \,\,\, {\rm return}; \qquad /* \,\,\, {\rm already } \,\, {\rm shown} \,\,\, */ \\ \mbox{if } (cur\_line > shown\_line + gap + 1 \,\,\,\,\, cur\_line < shown\_line) \,\,\, \{ \\ \mbox{if } (shown\_line > 0) \\ \mbox{if } (cur\_line < shown\_line) \,\,\, printf("-------\n"); \\ \mbox{/* indicate } \,\, {\rm upward } \,\, {\rm move} \,\,\, */ \\ \mbox{else } \,\, printf("_{\square\square\square\square\square}...\n"); \qquad /* \,\,\, {\rm indicate } \,\, {\rm the } \,\, {\rm gap} \,\,\, */ \\ \mbox{print\_line}(cur\_line); \\ \mbox{? } \,\,\, {\rm else } \,\,\, {\rm for } \,\, (k = shown\_line + 1; \,\, k \leq cur\_line; \,\, k + +) \,\,\, print\_line(k); \\ \mbox{shown\_line} = cur\_line; \\ \mbox{? } \,\,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \  \,\, \
```

```
ARGS = macro(), §11.
                                  ftell: long (), <stdio.h>.
                                                                     shown_line: int, §48.
buf\_size: int, §40.
                                  qap: int, §48.
                                                                     sprintf: int (), <stdio.h>.
buffer: Char *, §40.
                                  line_count: int, §38.
                                                                     src_file: FILE *, §48.
calloc: void *(), <stdlib.h>.
                                  line_shown: bool, §48.
                                                                    st\_mtime: time_t,
cur_file: int, §31.
                                  map: long *, §38.
                                                                      <sys/stat.h>.
cur_line: int, §31.
                                  name: char *, §38.
                                                                    stat: int (), <sys/stat.h>.
feof: int (), <stdio.h>.
                                  obj_time: tetra, §31.
                                                                     stderr: FILE *, <stdio.h>.
fgets: char *(), <stdio.h>.
                                  panic = macro(), §14.
                                                                     strlen: size_t (), <string.h>.
file_info: file_node [], §40.
                                  printf: int (), <stdio.h>.
                                                                    tetra = unsigned int, §10.
fprintf: int (), <stdio.h>.
                                  SEEK_SET = macro, <stdio.h>.
                                                                    true = 1, \S 9.
fseek: int (), <stdio.h>.
                                  shown_file: int, §48.
```

```
48. \langle Global variables 19\rangle + \equiv
  FILE *src_file;
                      /* the currently open source file */
  int shown\_file = -1; /* index of the most recently listed file */
  int shown_line: /* the line most recently listed in shown_file */
               /* minimum gap between consecutively listed source lines */
  bool line_shown; /* did we list anything recently? */
  bool showing_source;
                             /* are we listing source lines? */
  int profile_qap;
                    /* the gap when printing final frequencies */
  bool profile_showing_source;
                                    /* showing_source within final frequencies */
      \langle Prepare to list lines from a new source file 49\rangle \equiv
49.
     if (\neg src\_file) src\_file = fopen(file\_info[cur\_file].name, "r");
     else freopen(file_info[cur_file].name, "r", src_file);
     if (¬src_file) {
       fprintf(stderr, "Warning: LL can't Lopen Lfile L%s; Lsource Llisting Lomitted. \n",
            file_info[cur_file].name);
       showing\_source = false;
       return;
     printf("\"%s\"\n", file_info[cur_file].name);
     shown\_file = cur\_file;
     shown\_line = 0;
     if (\neg file\_info[cur\_file].map) make\_map();
This code is used in section 47.
```

**50.** Here is a simple application of *show\_line*. It is a recursive routine that prints the frequency counts of all instructions that occur in a given subtree of the simulated memory and that were executed at least once. The subtree is traversed in symmetric order; therefore the frequencies appear in increasing order of the instruction locations.

```
 \begin{array}{l} \langle \, {\rm Subroutines} \,\, 12 \, \rangle \, + \equiv \\ & {\bf void} \,\, print\_freqs \,\, {\rm ARGS}(({\bf mem\_node} \,\, *)); \\ & {\bf void} \,\, print\_freqs(p) \\ & {\bf mem\_node} \,\, *p; \\ \{ \\ & {\bf register} \,\, {\bf int} \,\, j; \\ & {\bf octa} \,\, cur\_loc; \\ & {\bf if} \,\, (p\rightarrow left) \,\, print\_freqs(p\rightarrow left); \\ & {\bf for} \,\, (j=0; \,\, j<512; \,\, j++) \\ & {\bf if} \,\, (p\rightarrow dat[j].freq) \,\, \langle \, {\rm Print} \,\, {\rm frequency} \,\, {\rm data} \,\, {\rm for} \,\, {\rm location} \,\, p\rightarrow loc+4*j \,\, 51 \, \rangle; \\ & {\bf if} \,\, (p\rightarrow right) \,\, print\_freqs(p\rightarrow right); \\ \} \end{array}
```

An ellipsis (...) is printed between frequency data for nonconsecutive instructions, unless source line information intervenes.  $\langle$  Print frequency data for location  $p \rightarrow loc + 4 * j = 51 \rangle \equiv$  $cur\_loc = incr(p \rightarrow loc, 4 * i);$ **if**  $(showing\_source \land p \rightarrow dat[j].line\_no)$  {  $cur\_file = p \rightarrow dat[j].file\_no, cur\_line = p \rightarrow dat[j].line\_no;$  $line\_shown = false$ ; show\_line(); if (line\_shown) goto loc\_implied; if  $(cur\_loc.l \neq implied\_loc.l \vee cur\_loc.h \neq implied\_loc.h)$ if (profile\_started) printf("\_\_\_\_\_0.\_\_\_0.\_\_\_\_0.\_\_\_\n");  $loc\_implied: printf("\%10d...\%08x\%08x:...\%08x)^{n}, p \rightarrow dat[j]. freq. cur\_loc.h. cur\_loc.l.$  $p \rightarrow dat[j].tet, info[p \rightarrow dat[j].tet \gg 24].name);$  $implied\_loc = incr(cur\_loc, 4); profile\_started = true;$ This code is used in section 50. **52.**  $\langle$  Global variables 19 $\rangle + \equiv$ **octa** *implied\_loc*; /\* location following the last shown frequency data \*/ /\* have we printed at least one frequency count? \*/ **bool** profile\_started; 53.  $\langle \text{ Print all the frequency counts 53} \rangle \equiv$ printf("\nProgram\_profile:\n");  $shown\_file = cur\_file = -1; shown\_line = cur\_line = 0;$  $qap = profile\_qap;$  $showing\_source = profile\_showing\_source;$  $implied\_loc = neg\_one$ ; print\_freqs (mem\_root);

This code is used in section 141.

```
freopen: FILE *(), <stdio.h>.
freq: tetra, §16.
h: tetra, §10.
incr: octa (), MMIX-ARITH §6.
info: op_info [], §65.
l: tetra, §10.
left: mem_node *, §16.
line_no: unsigned short, §16.
loc: octa, §16.
make_map: void (), §42.
map: long *, §38.
```

mem\_node = struct, §16.

mem\_root: mem\_node \*, §19.

name: char \*, §38.

neg\_one: octa, MMIX-ARITH §4.

octa = struct, §10.

printf: int (), <stdio.h>.

right: mem\_node \*, §16.

show\_line: void (), §47.

stderr: FILE \*, <stdio.h>.

tet: tetra, §16.

true = 1, §9.

MMIX-SIM: LISTS 360

**54.** Lists. This simulator needs to deal with 256 different opcodes, so we might as well enumerate them now.

```
\langle \text{Type declarations } 9 \rangle + \equiv
  typedef enum {
     TRAP, FCMP, FUN, FEQL, FADD, FIX, FSUB, FIXU,
     FLOT, FLOTI, FLOTU, FLOTUI, SFLOT, SFLOTI, SFLOTU, SFLOTUI,
     FMUL, FCMPE, FUNE, FEQLE, FDIV, FSQRT, FREM, FINT,
     MUL, MULI, MULU, MULUI, DIV, DIVI, DIVU, DIVUI,
     ADD, ADDI, ADDU, ADDUI, SUB, SUBI, SUBU, SUBUI,
     IIADDU, IIADDUI, IVADDU, IVADDUI, VIIIADDU, VIIIADDUI, XVIADDUI, XVIADDUI,
     CMP, CMPI, CMPU, CMPUI, NEG, NEGI, NEGU, NEGUI,
     SL. SLI. SLU. SLUI. SR. SRI. SRU. SRUI.
     BN, BNB, BZ, BZB, BP, BPB, BOD, BODB,
     BNN, BNNB, BNZ, BNZB, BNP, BNPB, BEV, BEVB,
     PBN. PBNB. PBZ. PBZB. PBP. PBPB. PBOD. PBODB.
     PBNN, PBNNB, PBNZ, PBNZB, PBNP, PBNPB, PBEV, PBEVB,
     CSN, CSNI, CSZ, CSZI, CSP, CSPI, CSOD, CSODI.
     CSNN, CSNNI, CSNZ, CSNZI, CSNP, CSNPI, CSEV, CSEVI,
     ZSN, ZSNI, ZSZ, ZSZI, ZSP, ZSPI, ZSOD, ZSODI,
     ZSNN, ZSNNI, ZSNZ, ZSNZI, ZSNP, ZSNPI, ZSEV, ZSEVI,
     LDB, LDBI, LDBU, LDBUI, LDW, LDWI, LDWU, LDWUI,
     LDT, LDTI, LDTU, LDTUI, LDO, LDOI, LDOU, LDOUI,
     LDSF, LDSFI, LDHT, LDHTI, CSWAP, CSWAPI, LDUNC, LDUNCI,
     LDVTS, LDVTSI, PRELD, PRELDI, PREGO, PREGOI, GO, GOI,
     STB, STBI, STBU, STBUI, STW, STWI, STWU, STWUI,
     STT, STTI, STTU, STTUI, STO, STOI, STOU, STOUI,
     STSF, STSFI, STHT, STHTI, STCO, STCOI, STUNC, STUNCI,
     SYNCD, SYNCDI, PREST, PRESTI, SYNCID, SYNCIDI, PUSHGO, PUSHGOI,
     OR, ORI, ORN, ORNI, NOR, NORI, XOR, XORI,
     AND, ANDI, ANDN, ANDNI, NAND, NANDI, NXOR, NXORI,
     BDIF, BDIFI, WDIF, WDIFI, TDIF, TDIFI, ODIF, ODIFI,
     MUX, MUXI, SADD, SADDI, MOR, MORI, MXOR, MXORI,
     SETH, SETMH, SETML, SETL, INCH, INCMH, INCML, INCL,
     ORH, ORMH, ORML, ORL, ANDNH, ANDNMH, ANDNML, ANDNL,
     JMP, JMPB, PUSHJ, PUSHJB, GETA, GETAB, PUT, PUTI,
     POP, RESUME, SAVE, UNSAVE, SYNC, SWYM, GET, TRIP
  } mmix_opcode:
     We also need to enumerate the special names for special registers.
\langle \text{Type declarations } 9 \rangle + \equiv
  typedef enum {
     rB, rD, rE, rH, rJ, rM, rR, rBB, rC, rN, rO, rS, rI, rT, rTT, rK, rQ, rU, rV, rG, rL,
          rA, rF, rP, rW, rX, rY, rZ, rWW, rXX, rYY, rZZ
  } special_reg;
56. \langle Global variables 19\rangle + \equiv
  char *special_name[32] = {"rB", "rD", "rE", "rH", "rJ", "rM", "rR", "rBB", "rC", "rN",
       "rO", "rS", "rI", "rT", "rTT", "rK", "rQ", "rU", "rV", "rG", "rL", "rA", "rF", "rP",
       "rW", "rX", "rY", "rZ", "rWW", "rXX", "rYY", "rZZ"};
```

361 MMIX-SIM: LISTS

**57.** Here are the bit codes for arithmetic exceptions. These codes, except H\_BIT, are defined also in MMIX-ARITH.

```
#define X_BIT (1 \ll 8)
                             /* floating inexact */
#define Z_BIT (1 \ll 9)
                             /* floating division by zero */
#define U_BIT (1 \ll 10)
                              /* floating underflow */
#define O_BIT (1 \ll 11)
                              /* floating overflow */
                              /* floating invalid operation */
#define I_BIT (1 \ll 12)
                              /* float-to-fix overflow */
#define W_BIT (1 \ll 13)
#define V_BIT
                (1 \ll 14)
                              /* integer overflow */
#define D_BIT (1 \ll 15)
                              /* integer divide check */
#define H_BIT (1 \ll 16)
                              /* trip */
```

**58.** The *bkpt* field associated with each tetrabyte of memory has bits associated with forced tracing and/or breaking for reading, writing, and/or execution.

```
#define trace\_bit (1 \ll 3)
#define read\_bit (1 \ll 2)
#define write\_bit (1 \ll 1)
#define exec\_bit (1 \ll 0)
```

**59.** To complete our lists of lists, we enumerate the rudimentary operating system calls that are built in to MMIXAL.

```
#define max_sys_call Ftell

\(\text{Type declarations 9}\) +\equiv typedef enum {

Halt, Fopen, Fclose, Fread, Fgets, Fgetws, Fwrite, Fputs, Fputws, Fseek, Ftell } sys_call;
```

**60.** The main loop. Now let's plunge in to the guts of the simulator, the master switch that controls most of the action.

```
\langle Perform one instruction 60\rangle \equiv
     if (resuming) loc = incr(inst\_ptr, -4), inst = q[rX].l;
     else (Fetch the next instruction 63);
     op = inst \gg 24; xx = (inst \gg 16) \& \text{ "ff}; yy = (inst \gg 8) \& \text{ "ff}; zz = inst \& \text{ "ff};
     f = info[op].flags; yz = inst \& #fffff;
     x = y = z = a = b = zero\_octa; exc = 0; old\_L = L;
     if (f \& rel\_addr\_bit) (Convert relative address to absolute address 70);
     (Install operand fields 71);
     if (f & X_is_dest_bit)
        (Install register X as the destination, adjusting the register stack if necessary 80);
     w = oplus(y, z);
     if (loc.h \ge #20000000) goto privileged\_inst;
     switch (op) {
     (Cases for individual MMIX instructions 84);
     (Check for trip interrupt 122);
     (Update the clocks 127);
     ⟨ Trace the current instruction, if requested 128⟩;
     if (resuming \land op \neq RESUME) resuming = false;
```

This code is used in section 141.

**61.** Operands x and a are usually destinations (results), computed from the source operands y, z, and/or b.

```
\langle \text{Global variables } 19 \rangle + \equiv
  octa w, x, y, z, a, b, ma, mb;
                                     /* operands */
  octa *x_ptr; /* destination */
               /* location of the current instruction */
  octa loc;
  octa inst_ptr;
                   /* location of the next instruction */
                 /* the current instruction */
  tetra inst:
  int old_L:
                /* value of L before the current instruction */
  int exc;
              /* exceptions raised by the current instruction */
                           /* exception bits that cause tracing */
  int tracing_exceptions;
              /* ropcode of a resumed instruction */
  int rop;
                      /* the style of floating point rounding just used */
  int round_mode;
                      /* are we resuming an interrupted instruction? */
  bool resuming;
  bool halted:
                   /* did the program come to a halt? */
  bool breakpoint;
                      /* should we pause after the current instruction? */
  bool tracing;
                   /* should we trace the current instruction? */
  bool stack_tracing;
                          /* should we trace details of the register stack? */
  bool interacting;
                        /* are we in interactive mode? */
  bool interact_after_break;
                               /* should we go into interactive mode? */
  bool tripping:
                   /* are we about to go to a trip handler? */
  bool good:
                 /* did the last branch instruction guess correctly? */
  tetra trace_threshold; /* each instruction should be traced this many times */
```

} op\_info;

```
62. \langle \text{Local registers } 62 \rangle \equiv
  register mmix_opcode op;
                                       /* operation code of the current instruction */
  register int xx, yy, zz, yz; /* operand fields of the current instruction */
  register tetra f; /* properties of the current op */
  register int i, j, k; /* miscellaneous indices */
  register mem_tetra *ll;
                                    /* current place in the simulated memory */
  register char *p;
                         /* current place in a string */
See also section 75.
This code is used in section 141.
      \langle Fetch the next instruction 63\rangle \equiv
     loc = inst\_ptr;
     ll = mem\_find(loc):
     inst = ll \rightarrow tet;
     cur\_file = ll \neg file\_no;
     cur\_line = ll \rightarrow line\_no;
     ll \rightarrow freq ++;
     if (ll \rightarrow bkpt \& exec\_bit) breakpoint = true;
     tracing = breakpoint \lor (ll \neg bkpt \& trace\_bit) \lor (ll \neg freq \le trace\_threshold);
     inst\_ptr = incr(inst\_ptr, 4);
This code is used in section 60.
      Much of the simulation is table-driven, based on a static data structure called
the op_info for each operation code.
\langle \text{Type declarations } 9 \rangle + \equiv
  typedef struct {
     char *name;
                         /* symbolic name of an opcode */
     unsigned char flags;
                                  /* its instruction format */
     unsigned char third_operand;
                                            /* its special register input */
     unsigned char mems; /* how many \mu it costs */
     unsigned char oops;
                                  /* how many v it costs */
```

```
bkpt: unsigned char, §16.
                                    info: op_info [], §65.
                                                                         rel\_addr\_bit = #40, §65.
\mathbf{bool} = \mathbf{enum}, \ \S 9.
                                                                         RESUME = ^{\#}f9, §54.
                                    l: tetra, §10.
cur_file: int, §31.
                                    L: register int, §75.
                                                                         rX = 25, \S 55.
cur_line: int, §31.
                                    line_no: unsigned short, §16.
                                                                         tet: tetra, §16.
                                                                         tetra = unsigned int, §10.
exec\_bit = macro, \S 58.
                                    mem\_find: mem\_tetra *(),
false = 0, \S 9.
                                       §20.
                                                                         trace\_bit = macro, \S 58.
file_no: unsigned char, §16.
                                    mem_tetra = struct, \S 16.
                                                                         true = 1, \S 9.
                                                                         X_is_dest_bit = #20, §65.
freq: tetra, §16.
                                    mmix\_opcode = enum, §54.
g: octa [], §76.
                                    octa = struct, §10.
                                                                         zero_octa: octa,
                                    oplus: \mathbf{octa} (), MMIX-ARITH §5.
h: tetra, §10.
                                                                          MMIX-ARITH §4.
incr: octa (), MMIX-ARITH §6.
                                    privileged_inst: label, §107.
```

**char** \*trace\_format; /\* how it appears when traced \*/

#define Z\_is\_immed\_bit

**65.** For example, the *flags* field of info[op] tells us how to obtain the operands from the X, Y, and Z fields of the current instruction. Each entry records special properties of an operation code, in binary notation: #1 means Z is an immediate value, #2 means rZ is a source operand, #4 means Y is an immediate value, #8 means rY is a source operand, #10 means rX is a source operand, #20 means rX is a destination, #40 means YZ is part of a relative address, #80 means a push or pop or unsave instruction.

The trace\_format field will be explained later.

#1

```
#define Z_is_source_bit
#define Y_is_immed_bit
#define Y_is_source_bit #8
#define X_is_source_bit #10
#define X_i = dest_b = 20
#define rel_addr_bit #40
#define push_pop_bit #80
\langle Global variables 19\rangle + \equiv
    op_info info[256] = \{\langle \text{ Info for arithmetic commands } 66 \rangle, \langle \text{ Info for branch } \rangle \}
              commands 67, (Info for load/store commands 68), (Info for logical and control
              commands 69 \rangle};
             \langle \text{ Info for arithmetic commands } 66 \rangle \equiv
     {"TRAP", *0a, 255, 0, 5, "%r"},
               \{ \text{"FCMP"}, \text{#2a}, 0, 0, 1, \text{"%l} = \text{"}, \text{y} \text{cmp} \text{"}, \text{z} = \text{"}, \text{x"} \},
               \{"FUN", #2a, 0, 0, 1, "%1=[%.y(||)%.z]=[%x"\},
               \{\text{"FEQL"}, \text{#2a}, 0, 0, 1, \text{"%l}, =, [\text{%.y}(==)\text{%.z}], =, \text{%x"}\},
                 \{"FIX", *26, 0, 0, 4, "%l_i = 1, %(fix%)_i, x_i = 1, %x"\},
                 "FSUB", #2a, 0, 0, 4, "%1, =, %. y_1%(-%), %. z_1 =, \%. x"},
                \{"FIXU", #26, 0, 0, 4, "%l_=_\%(fix%)_\%.z_=_\%#x"\},
                 "FLOT", #26,0,0,4, "1 = 1  (flot1 = 1  (flot1 = 1 ).x"},
               \{"FLOTI", #25, 0, 0, 4, "%1_=_\%(flot%)_\%z_=_\%.x"\},
                 "FLOTU", #26, 0, 0, 4, "%1 = %(flot%) % z = %.x" 
                 "FLOTUI", #25,0,0,4,"%1,=,,%(flot%),,%z,=,,%.x"},
                 "SFLOT", #26, 0, 0, 4, "%1_{\square}=_{\square}%(sflot%) _{\square}%z_{\square}=_{\square}%.x"},
                 "SFLOTI", #25,0,0,4,"%1,=,%(sflot%),%z,=,,%.x"},
                 "SFLOTU", #26, 0, 0, 4, "%1_{\square} = \%(sflot%) \%#z_{\square} = \%.x"},
                 "SFLOTUI", #25, 0, 0, 4, "1_{\square} = \%(\text{sflot})_{\square} z_{\square} = \% x"},
                 "FMUL", #2a, 0, 0, 4, "%1_{\square}=_{\square}%. y_{\square}%(*%) _{\square}%. z_{\square}=_{\square}%. x"},
                 "FCMPE", #2a, rE, 0, 4, "%1_{\Box}=_{\Box}%. y_{\Box}cmp_{\Box}%. z_{\Box}(%.b))_{\Box}=_{\Box}%x"},
               \{"FUNE", ^{\#}2a, rE, 0, 1, "%1_=_[%.y(||)%.z_{|}(%.b)]_=_%x"\},
                 "FEQLE", #2a, rE, 0, 4, "%1_{\square}=_{\square}[%.y(==)%.z_{\square}(%.b)]_{\square}=_{\square}%x"},
                 "FDIV", #2a, 0, 0, 40, "%1_{\square}=_{\square}%. y_{\square}%(/%)_{\square}%. z_{\square}=_{\square}%. x"},
                 "FSQRT", #26, 0, 0, 40, "%1, =, \(\(\)(\sqrt\)\), \(\).z_1 = \(\)(\x\"\),
               \{"FREM", *2a, 0, 0, 4, "%l_=_%.y_%(rem%)_%.z_=_%.x"\},
                 "FINT", #26,0,0,4, "%1,=,,%(int%),,%.z,,=,,%.x"},
                \{"MUL", *2a, 0, 0, 10, "%1, =, %y, *, %z, =, %x"\},
               {"MULI", ^{\#}29, 0, 0, 10, "^{1}1_{\square}=^{1}^{1}^{1}^{2}^{2}^{2}^{2}^{3}^{4}^{5}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{6}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{7}^{
               \{"MULU", #2a, 0, 0, 10, "%1, =, \%#y, *, \%#z, =, \%#x, \rH=\%#a"\},
```

```
\{"DIV", *2a, 0, 0, 60, "%1_=_\%y_\/_\%z_=_\%x, _rR=%a"\},
\{"DIVI", #29, 0, 0, 60, "%1, =, %v, /, %z, =, %x, rR=%a"\},
\{"DIVU", \#2a, rD, 0, 60, "\%1, = 1\%\#b\%0y_1/_1\%\#z_1 = 1\%\#x,_1rR=\%\#a"\},
\{"DIVUI", #29, rD, 0, 60, "%1, =, %#b%0v, /, %z, =, %#x, rR=%#a"\},
\{\text{"ADD"}, \text{#2a}, 0, 0, 1, \text{"%1}, =,, \text{%y}, +, \text{%z}, =,, \text{%x"}\},
\{"ADDI", ^{\#}29, 0, 0, 1, "\%1 = 1\%y + 1\%z = 1\%x"\},
{"ADDU", #2a, 0, 0, 1, "%1, = , %#v, +, , %#z, =, , %#x"},
{"ADDUI", #29, 0, 0, 1, "%1_=_\%#y_+_\%z_=_\%#x"},
{"SUB", ^{\#}2a, 0, 0, 1, "%1, =, \%y, -, \%z, =, \%x"},
{"SUBI", ^{\#}29, 0, 0, 1, "%1, =, \%y, -, \%z, =, \%x"}.
\{"SUBUI", #29, 0, 0, 1, "%1, =, \% + \y, -, \% z, =, \% + \x"\},
{"2ADDU", #2a, 0, 0, 1, "%1, =, %#v, <<1+, %#z, =, %#x"},
\{"2ADDUI", #29, 0, 0, 1, "%1, =, \%#v, <<1+, \%z, =, \%#x"\},
\{"4ADDU", #2a, 0, 0, 1, "%1, =, \% #v, <<2+, \% #z, =, \% #x"\},
\{"4ADDUI", #29, 0, 0, 1, "%1, =, \%#y, <<2+, \%z, =, \%#x"\},
{"8ADDU", #2a, 0, 0, 1, "%1, =, %#v, <<3+, %#z, =, %#x"},
{"8ADDUI", ^{\#}29, 0, 0, 1, "%1, =, \%#y, <<3+, \%z, =, \%#x"},
\{"16ADDU", #2a, 0, 0, 1, "%1=\%#y\<<4+\%#z\=\%#x"\},
{"16ADDUI", #29, 0, 0, 1, "%l, =, 1, %#v, 1<<4+, 1, %z, =, 1, % x "},
\{"CMP", #2a, 0, 0, 1, "%1, =, |%v_1|cmp_1|%z_1 =, |%x"\},
\{"CMPI", #29, 0, 0, 1, "%1, =, %v, cmp, %z, =, %x"\}
\{"CMPU", **2a, 0, 0, 1, "%1, =, | %#y, | cmp, | %#z, | =, | %x" \},
\{"CMPUI", #29, 0, 0, 1, "%1, =, \%#v, cmp, \%z, =, \%x"\}.
\{"NEG", *26, 0, 0, 1, "%1_= \%y_- - \%z_= \%x"\},
\{"NEGI", #25, 0, 0, 1, "%1, = , %y, -, %z, = , %x"\},
\{"NEGU", #26, 0, 0, 1, "%1_=_\%y_-_\%#z_=_\%#x"\},
\{"NEGUI", #25, 0, 0, 1, "%1, = ..%v, -..%z, = ..%#x"\},
\{"SL", \#2a, 0, 0, 1, "\%1 = \%v_1 << \%\#z_1 = \%x"\}.
{"SLI", #29, 0, 0, 1, "%1, |=, |%v, |<<, |%z, |=, |%x"},
\{"SLU", #2a, 0, 0, 1, "%1, =,, %#y,, <<,, %#z, =,, %#x"\},
{"SLUI", #29, 0, 0, 1, "%1, =, \%#v, <<, \%z, =, \%#x"},
\{"SR", #2a, 0, 0, 1, "%1_=_\%y_>>_\%#z_=_\%x"\},
{"SRI", #29, 0, 0, 1, "%1, =, \%y, \>>, \%z, =, \%x"},
{"SRU", #2a, 0, 0, 1, "%1_= \%#y_>> \%#z_= \%#x"},
{"SRUI", ^{\#}29, 0, 0, 1, "%1, |=, |%#v, |>>, |%z, |=, |%#x"}
```

This code is used in section 65.

```
67.
      \langle \text{ Info for branch commands } 67 \rangle \equiv
  \{"BN", #50, 0, 0, 1, "\%b<0?_\%t%g"\},
        \{"BNB", #50, 0, 0, 1, "\%b<0?_\%t%g"\},
        \{"BZ", #50, 0, 0, 1, "\%b==0?\\\tag{\text{t\g}}\\}.
        \{"BZB", #50, 0, 0, 1, "\%b==0?_\%t%g"\}.
         {"BP", #50,0,0,1, "%b>0?, %t%g"},
        {"BOD", #50, 0, 0, 1, "%b, odd?, , %t%g"},
        {"BODB", #50, 0, 0, 1, "%b, odd?, %t%g"},
        \{"BNN", #50, 0, 0, 1, "\%b \ge 0? \ \%t\%g"\},
        \{"BNZ", #50, 0, 0, 1, "\%b! = 0?_\%t%g"\},
        \{"BNZB", #50, 0, 0, 1, "%b!=0?, %t%g"\}.
        {"BNP", #50, 0, 0, 1, "%b<=0?, %t%g"}.
        \{"BNPB", #50, 0, 0, 1, "\%b <= 0? \ \%t\%g"\}.
        \{"BEV", #50, 0, 0, 1, "\%b_leven?_l\%t\%g"\}.
        {"BEVB", #50, 0, 0, 1, "%b, even?, %t%g"},
        \{"PBN", #50, 0, 0, 1, "\%b<0?_\%t%g"\},
        {"PBNB", #50, 0, 0, 1, "%b<0?, \%t%g"},
        \{"PBZ", #50, 0, 0, 1, "\%b==0?_\%t%g"\},
         {"PBZB", #50,0,0,1, "%b==0?, %t%g"}.
        \{"PBP", #50, 0, 0, 1, "%b>0?_\%t%g"\},
         \{"PBOD", #50, 0, 0, 1, "\%b_lodd?_l\%t\%g"\}.
        {"PBODB", #50, 0, 0, 1, "\%b_odd?_\%t%g"},
        {"PBNN", ^{\#}50, 0, 0, 1, "%b>=0?\\\tau\tau\tau\g\"}.
        \{"PBNNB", #50, 0, 0, 1, "%b>=0?_\%t%g"\}.
        \{"PBNZ", #50, 0, 0, 1, "\%b!=0?_{||}\%t\%g"\},
        \{"PBNZB", #50, 0, 0, 1, "\%b! = 0?, \%t\%g"\}.
         {"PBNP". #50.0.0.1. "%b<=0?..%t%g"}.
        {"PBNPB", #50, 0, 0, 1, "%b<=0?, \%t%g"}
        \{"PBEV", #50, 0, 0, 1, "\%b_leven?_l\%t\%g"\},
        \{"PBEVB", #50, 0, 0, 1, "\%b_leven?_l\%t\%g"\},
         {"CSN", #3a, 0, 0, 1, "%1, =, \%y<0?, \%z:, \%b, =, \%x"},
        \{"CSNI", #39, 0, 0, 1, "%1_=_\%y<0?_\%z:_\%b_=_\%x"\},
         \{"CSZ", ^{\#}3a, 0, 0, 1, "\%1, =, \%y == 0?, \%z:, \%b, =, \%x"\},
        \{"CSZI", #39, 0, 0, 1, "%1, =, %v==0?, %z:, %b, =, %x"\},
         {"CSP", #3a, 0, 0, 1, "%1, =, %v>0?, %z:, %b, =, %x"},
        \{"CSPI", #39, 0, 0, 1, "%1, = , %y>0?, %z:, %b, = , %x"\}
         \{"CSOD", ^{\#}3a, 0, 0, 1, "%1_{\sqcup}=_{\sqcup}\%y_{\sqcup}odd?_{\sqcup}\%z:_{\sqcup}\%b_{\sqcup}=_{\sqcup}\%x"\}
        \{"CSODI", #39, 0, 0, 1, "%l_=_\%y_odd?_\%z:_\%b_=_\%x"\},
        \{"CSNN", #3a, 0, 0, 1, "%1, =, \%y>=0?, \%z:, \%b, =, \%x"\},
        \{"CSNNI", #39, 0, 0, 1, "%l_=_\%y>=0?_\%z:_\%b_=_\%x"\},
        \{"CSNZ", #3a, 0, 0, 1, "%1, =, %y! = 0?, %z:, %b, =, %x"\},
        \{"CSNZI", #39, 0, 0, 1, "%1, =, %v! = 0?, %z:, %b, =, %x"\},
        \{"CSNP", #3a, 0, 0, 1, "%1, =, \%y <= 0?, \%z:, \%b, =, \%x"\}.
        \{"CSNPI", #39, 0, 0, 1, "%1, = 1, %v <= 0?, 1, %z : 1, %b, = 1, %x"\}
        \{"CSEV", **3a, 0, 0, 1, "%1, =, %v, even?, %z:, %b, =, %x"\}
        \{"CSEVI", #39, 0, 0, 1, "%l_i = 1, %y_i even?_1, %z:_1, %b_i = 1, %x"\},
```

```
\{"ZSN", #2a, 0, 0, 1, "%1, =, \%y<0?, \%z:, 0, =, \%x"\},
{"ZSNI", #29, 0, 0, 1, "%l_=_, %y<0?_, %z:_0_=_, %x"},
\{"ZSZ", #2a, 0, 0, 1, "%1, =, %v==0?, %z:, 0, =, %x"\},
\{"ZSZI", #29, 0, 0, 1, "%1, =, %y==0?, %z:, 0, =, %x"\},
\{"ZSP", #2a, 0, 0, 1, "%1, =, %v>0?, %z:, 0, =, %x"\},
\{"ZSPI", #29, 0, 0, 1, "%1, =, \%y>0?, \%z:, 0, =, \%x"\}.
\{"ZSOD", ^{\#}2a, 0, 0, 1, "\%1, =, \%y, odd?, \%z:, 0, =, \%x"\},
{"ZSODI", #29, 0, 0, 1, "%l_=_\%y_odd?_\%z:_0_=_\%x"},
{"ZSNN", *2a, 0, 0, 1, "%1_=_\%y>=0?_\%z:_0_=_\%x"},
{"ZSNNI", #29, 0, 0, 1, "%1_=_\%y>=0?_\%z:_0_=_\%x"},
\{"ZSNZ", #2a, 0, 0, 1, "%1, =, \%y! = 0?, \%z:_10, =, \%x"\},
\{"ZSNZI", #29, 0, 0, 1, "%1, =, %v! = 0?, %z:, 0, =, %x"\},
\{"ZSNP", *2a, 0, 0, 1, "%1_=__%y<=0?__%z:__0_=__%x"\},
{"ZSNPI", #29,0,0,1, "%1_=_\%y<=0?\\z:\\0\=\\%x"},
\{"ZSEV", #2a, 0, 0, 1, "%l_i=i, %v_i even?i, %z:i_0, i=i, %x"\},
{"ZSEVI", #29, 0, 0, 1, "%l_=_\%y_even?_\%z:_0_=_\%x"}
```

This code is used in section 65.

```
68.
      \langle \text{ Info for load/store commands } 68 \rangle \equiv
  {"LDB", \#2a, 0, 1, 1, "%1, =, M1[%#v+%#z], =, %x"},
       {"LDBI", \#29, 0, 1, 1, "\%1 = M1[\% \#y\%? +] = \%x"}
        \{"LDBU", \#2a, 0, 1, 1, "\%1, = M1 [\%#v+\%#z], = \%#x" \}.
        {"LDBUI", #29, 0, 1, 1, "%1, = M1 [%#v%?+], = \%#x"},
        {"LDW", #2a, 0, 1, 1, "%1, =, M2[%#v+%#z], =, \%x"},
        {"LDWI", #29,0,1,1,"%1_=_M2[%#y%?+]_=_\%x"},
        {"LDWU", #2a, 0, 1, 1, "%1, = , M2[%#y+%#z], = , %#x"},
        \{"LDWUI", #29, 0, 1, 1, "%1, = M2[%#v%?+], = %#x"\},
        \{"LDT", \#2a, 0, 1, 1, "\%1, =, M4[\%#y+\%#z], =, \%x"\},
        \{"LDTI", #29, 0, 1, 1, "%1, = M4[%#v%?+], = %x"\},
        \{"LDTU", \#2a, 0, 1, 1, "\%1, = M4 [\%#v+\%#z], = \%#x"\}.
        {"LDTUI", #29, 0, 1, 1, "%1, =, M4 \\#v\%?+\, =, \\#x\\\.
        \{"LDO", \#2a, 0, 1, 1, "\%1 = M8 [\% + v + \% + z] = \% x"\}.
        \{"LDOI", #29, 0, 1, 1, "%1, = M8 [%#v%?+], = %x"\}.
        \{"LDOU", ^{\#}2a, 0, 1, 1, "%1, =, M8[%#v+%#z], =, %#x"\},
        {"LDOUI", #29, 0, 1, 1, "%1, =, M8[%#v%?+], =, %#x"},
        \{"LDSF", #2a, 0, 1, 1, "%1, =, (M4[\%#y+\%#z]), =, \%.x"\},
        {"LDSFI", ^{\#}29, 0, 1, 1, "%1, =, (M4[%#v%?+]), =, \%.x"},
        \{"LDHT", #2a, 0, 1, 1, "%1 = M4[%#v+%#z] << 32 = %#x"\},
        {"LDHTI".<sup>#</sup>29.0.1.1."%1,=,M4[%#v%?+]<<32,=,%#x"}.
        \{"CSWAP", #3a, 0, 2, 2, "%1, =, [M8[%#v+%#z] == %a], =, %x, , %r"\},
        {"CSWAPI", #39,0,2,2,"%1,=,[M8[%#y%?+]==%a],=,%x,,,%r"},
        \{"LDUNC", \#2a, 0, 1, 1, "\%1, =, M8[\%*v+\%*z], =, \%*x"\},
        {"LDUNCI", \#29, 0, 1, 1, \text{"%1} = M8[\% \text{"} \%? +] = \% \text{"} \text{"} \text{x"}},
        {"LDVTS", #2a, 0, 0, 1, ""}.
        {"LDVTSI", #29, 0, 0, 1, ""},
        {"PRELD", ^{\#}0a, 0, 0, 1, "[%#y+%#z<sub>||</sub>...|%#x]"},
        {"PREGO", #0a, 0, 0, 1, "[%#v+%#z,,,,,,,%#x]"}.
        {"PREGOI", #09, 0, 0, 1, "[%#v%?+,..., %#x]"}.
        \{"GO", #2a, 0, 0, 3, "%1, =, \% #x, ->, \% #y + \% #z"\}.
        \{"GOI", #29, 0, 0, 3, "%1, = 1, % #x, 1, -> 1, % #y%? + "\},
        \{"STB", #1a, 0, 1, 1, "M1[%#v+%#z]_= ... \%b, ... M8[%#w]=%#a"\}.
        {"STBI", #19, 0, 1, 1, "M1[%#y%?+]_= _ %b, _ M8[%#w] = %#a"},
        {"STBU", #1a, 0, 1, 1, "M1[%#y+%#z], =, |%#b,, | M8[%#w] =%#a"}.
        {"STBUI", #19, 0, 1, 1, "M1[%#v%?+], =, %#b, %#M8[%#w]=%#a"},
        {"STWI", #19, 0, 1, 1, "M2[%#y%?+]_|=|%b,|M8[%#w]=%#a"},
        \{"STWU", #1a, 0, 1, 1, "M2[%#v+%#z]_=, %#b, M8[%#w]=%#a"\}.
        \{"STWUI", #19, 0, 1, 1, "M2[%#y%?+]_|=|%#b,||M8[%#w]=%#a"\},
        \{"STT", #1a, 0, 1, 1, "M4[%#y+%#z]_{||=||%b, ||M8[%#w]=%#a"},
        {"STTI", #19, 0, 1, 1, "M4[%#v%?+]_=, %b, M8[%#w]=%#a"},
        \{"STTU", #1a, 0, 1, 1, "M4[%#y+%#z]_|=|, %#b, |, M8[%#w]=%#a"\},
        \{"STTUI", #19, 0, 1, 1, "M4[%#v%?+], = 1, %#b, M8[%#w] = %#a"\},
        \{"STO", #1a, 0, 1, 1, "M8[%#y+%#z]_{||=||%b"}\},
        \{"STOI", #19, 0, 1, 1, "M8[%#v%?+] = ... \%b"\}.
        {"STOUI", #19, 0, 1, 1, "M8[\%", +], =, \%#b"},
```

```
{"STSF", #1a, 0, 1, 1, "%(M4[%#y+%#z]%)_=, %.b, M8[%#w]=%#a"},
{"STSFI", #19, 0, 1, 1, "%(M4[%#y%?+]%)_=_\%.b,_\M8[%#w]=%#a"},
{"STHT", #1a, 0, 1, 1, "M4[%#v+%#z]_i=_i%#b>>32,_iM8[%#w]=%#a"},
{"STHTI", #19,0,1,1,"M4[\%#y\?+],=,\%#b>>32,,\M8[\%#\]=\%#a"},
{"STCO", #0a, 0, 1, 1, "M8[%#y+%#z]_|=|%b"},
\{"STCOI", #09, 0, 1, 1, "M8[%#y%?+]_=, %b"\},
{"STUNC", #1a, 0, 1, 1, "M8[%#y+%#z]_= _\%#b"},
{"STUNCI", #19, 0, 1, 1, "M8[%#v%?+], =, %#b"},
{"SYNCD", *0a, 0, 0, 1, "[\%*y+\%*z_{...}\%*x]"},
{"SYNCDI", #09, 0, 0, 1, "[\%#y\%?+_{\square}.._{\square}%#x]"},
{"PREST", #0a, 0, 0, 1, "[%#y+%#z_{\square}.._{\square}%#x] "_{\uparrow},
{"PRESTI", #09, 0, 0, 1, "[%#v%?+...., %#x]"},
\{"SYNCID", \#0a, 0, 0, 1, "[\%#y+\%#z_{...}\%#x]"\},
{"SYNCIDI", #09, 0, 0, 1, "[\%#v%?+....\%#x]"}.
{"PUSHGO", #aa, 0, 0, 3, "%1r0=%#b, _{\parallel}rL=%a, _{\parallel}rJ=%#x, _{\parallel}->_{\parallel}%#v+%#z"},
{"PUSHGOI", #a9, 0, 0, 3, "%lr0=%#b, |rL=%a, |rJ=%#x, |->| // #v%?+"}
```

This code is used in section 65.

```
69.
            \langle Info for logical and control commands 69\rangle \equiv
     \{"OR", #2a, 0, 0, 1, "%1| = | \% y_1 | | \% z_1 = | \% x" \},
              \{"ORI", #29, 0, 0, 1, "%1, =, \% + v_1, |, \% z_1 =, \% + x"\}.
               \{"ORN", #2a, 0, 0, 1, "%1, =, \% + v_1, 1^2, \% + z_1 =, \% + x"\}.
               \{"ORNI", #29, 0, 0, 1, "%1 = \% \psi v_1 | ~ \% z_1 = \% \psi x" \},
               {"NOR", #2a, 0, 0, 1, "%1, =, |%#v, ~ |, |%#z, =, |%#x"},
               \{"NORI", *29, 0, 0, 1, "%1, = | %#y| ~ | | %z| = | %#x" \},
               \{"XOR", ^{\#}2a, 0, 0, 1, "%1, =, \% \#y, ^{, \} \#z, =, \% \#x"\},
               {"XORI", \#29, 0, 0, 1, \text{"%1} = 1\% \text{ which is the sum of the su
               {"ANDI", #29, 0, 0, 1, "%1, =, \%#\y, \&, \%z, =, \\\#\x"},
               \{"ANDNI", #29, 0, 0, 1, "%l_l=_l, #v_l, \_, %z_l=_l, #x"\}.
               \{"NAND", \#2a, 0, 0, 1, "\%1 = \% \#v = \% \#z = \% \#x"\}.
               \{"NANDI", #29, 0, 0, 1, "%1, =, \%#v, ~&, \%z, =, \%#x"\}.
               \{"NXOR", #2a, 0, 0, 1, "%1_=_\%#y_-^_\%#z_=_\%#x"\},
               \{"NXORI", #29, 0, 0, 1, "%1, =, \% + v_1 - 1, \% z_1 =, \% + x"\},
               {"BDIF", #2a, 0, 0, 1, "%1, =, \%#y, \bdif_\%#z, =, \%#x"},
               {"BDIFI", #29, 0, 0, 1, "%l, =, \%#v, \bdif_\\%z, \=, \\%#x"},
               {"WDIF", #2a, 0, 0, 1, "%1, =, \%#v, \wdif, \%#z, =, \%#x"},
                {"WDIFI", #29.0.0.1, "%1, = \%#v\wdif\%z\ = \%#x"}.
               {"TDIF", #2a, 0, 0, 1, "%1, =, \%#v, \tdif, \%#z, =, \%#x"},
               \{"TDIFI", #29, 0, 0, 1, "%l_i=_i\%#y_itdif_i\%z_i=_i\%#x"\},
               \{"ODIFI", #29, 0, 0, 1, "%l_l = 1, %#y_lodif_1, %z_l = 1, %#x"\},
               \{"MUX", ^{\#}2a, rM, 0, 1, "%1, =, \%#b?, \%#y:, \%#z, =, \%#x"\},
               \{"MUXI", #29, rM, 0, 1, "%1, =, %#b?, %#v:, %z, =, %#x"\},
               \{"SADD", #2a, 0, 0, 1, "%l_i = nu(%#y\\%#z)_i = 1%x"\},
               {"SADDI", ^{\#}29, 0, 0, 1, "%1, = \nu(\%\pm\\)\\\=\\\\x\"}.
               {"MOR", #2a, 0, 0, 1, "%1, =, %#v, mor, %#z, =, %#x"}.
                "MORI". #29.0.0.1. "%1 = \%#v mor \%z = \%#x"}.
               \{"MXOR", \#2a, 0, 0, 1, "\%1, =, \%\#y, mxor, \%\#z, =, \%\#x"\},
               \{"MXORI", #29, 0, 0, 1, "%l_i=_i\%#y_imxor_i\%z_i=_i\%#x"\},
               {"SETH", #20,0,0,1, "%1, =, |%#z"},
               \{"SETMH", #20, 0, 0, 1, "%1, =, \%#z"\},
               {"SETML", #20, 0, 0, 1, "%1, =, \%#z"},
               \{"SETL", #20, 0, 0, 1, "%1, =, \%#z"\},
                {"INCH", #30, 0, 0, 1, "%1, =, 1,%#v, 1+, 1,%#z, 1=, 1,%#x"},
               \{"INCMH", #30, 0, 0, 1, "%1, =, \%#y, +, \%#z, =, \%#x"\},
               \{"INCML", #30, 0, 0, 1, "%1, =, \% #v, +, \% #z, =, \% #x"\},
               \{"INCL", #30, 0, 0, 1, "%1, = 1, %#y, +1, %#z, = 1, %#x"\},
               \{"ORH", #30, 0, 0, 1, "%1, =, \% + y_1, |, \% + z_1 =, \% + x"\},
               \{"ORMH", #30, 0, 0, 1, "%1, =, \% #v_1, |, \% #z_1 =, \% #x"\},
               \{"ORML", #30, 0, 0, 1, "%1, = | %#y| | | | %#z| = | | %#x" \},
               \{"ORL", #30, 0, 0, 1, "%1, =, \% + v_1, |, \% + z_1 =, \% + x"\},
               \{"ANDNH", #30, 0, 0, 1, "%1, =, | %#y, | \ | , | %#z, =, | %#x" \}
               {"ANDNML", \#30, 0, 0, 1, "\%1 = \% v_1 \% v_2 = \% x"},
               \{"ANDNL", #30, 0, 0, 1, "%1|=|,%#y|, \\|,%#z|=|,%#x"\},
```

```
\{"JMPB", #40, 0, 0, 1, "->_\\\#z"\},
         {"PUSHJ", #e0,0,0,1,"%1r0=%#b,,,rL=%a,,rJ=%#x,,,->,,%#z"},
         {"PUSHJB", #e0, 0, 0, 1, "%lr0=%#b, | rL=%a, | rJ=%#x, | -> | %#z"},
         \{"GETA", #60, 0, 0, 1, "%1, =, \%#z"\},
         \{"GETAB", \#60, 0, 0, 1, "\%1, =, \mathbb{/} \#z"\},
         \{"PUT", \#02, 0, 0, 1, "\%s_{||} = \ \ \%r"\},
         \{"PUTI", *01, 0, 0, 1, "%s_{\square} = \ \ \ \ \},
         \{"POP", #80, rJ, 0, 3, "%lrL=%a, \_rO=%#b, \_-> \_%#y%?+"\},
         \{"RESUME", #00, 0, 0, 5, "\{\%\#b\}_- > \%\#z"\},
         \{"SAVE", #20, 0, 20, 1, "%1_=_\%#x"\},
         \{"UNSAVE", #82, 0, 20, 1, "\%z:_rG=\%x,...,rL=\%a"\},
         {"SYNC", #01, 0, 0, 1, ""},
         {"SWYM", #00, 0, 0, 1, ""}.
         \{"GET", #20, 0, 0, 1, "%1, = 1, %s, = 1, %#x"\},
         {"TRIP", ^{\#}0a, 255, 0, 5, "rW=%#w, _{\square}rX=%#x, _{\square}rY=%#y, _{\square}rZ=%#z, _{\square}rB=%#b, _{\square}g[255]=%#a"}
This code is used in section 65.
70.
     \langle Convert relative address to absolute address 70 \rangle \equiv
     if ((op \& #fe) \equiv JMP) yz = inst \& #fffffff;
      if (op \& 1) yz = (op \equiv JMPB? #1000000: #10000);
      y = inst\_ptr; \ z = incr(loc, yz \ll 2);
This code is used in section 60.
71. (Install operand fields 71) \equiv
   if (resuming \land rop \neq RESUME\_AGAIN)
      (Install special operands when resuming an interrupted operation 126)
      if (f \& #10) \langle \text{Set } b \text{ from register X } 74 \rangle;
      if (info[op].third\_operand) \langle Set b from special register 79\rangle;
      if (f \& #1) z.l = zz;
      else if (f \& #2) \langle \text{Set } z \text{ from register Z } 72 \rangle
      else if ((op \& ^{\#}f0) \equiv SETH) \langle Set z \text{ as an immediate wyde 78} \rangle;
      if (f \& #4) y.l = yy;
      else if (f \& \#8) \langle \text{Set } y \text{ from register Y } 73 \rangle;
This code is used in section 60.
```

```
SETH = ^{\#}eO, §54.
b: octa, §61.
                                  loc: octa, §61.
f: register tetra, §62.
                                                                    third_operand: unsigned
                                  op: register mmix_opcode,
incr: octa (), MMIX-ARITH §6.
                                                                      char, §64.
                                    §62.
info: op_info [], §65.
                                  RESUME_AGAIN = 0, §125.
                                                                    y: octa, §61.
inst: tetra, §61.
                                  resuming: bool, §61.
                                                                    yy: register int, §62.
inst\_ptr: \mathbf{octa}, \S 61.
                                  rJ = 4, \S 55.
                                                                    yz: register int, §62.
JMP = #f0, §54.
                                  rM = 5, \S 55.
                                                                    z: octa, §61.
JMPB = #f1, §54.
                                  rop: int, §61.
                                                                    zz: register int, §62.
l: tetra, §10.
```

**72.** There are 256 global registers, g[0] through g[255]; the first 32 of them are used for the special registers rA, rB, etc. There are  $lring\_mask + 1$  local registers, usually 256 but the user can increase this to a larger power of 2 if desired.

The current values of rL, rG, rO, and rS are kept in separate variables called L, G, O, and S for convenience. (In fact, O and S actually hold the values rO/8 and rS/8, modulo  $lrinq\_size$ .)

```
\langle \text{Set } z \text{ from register Z } 72 \rangle \equiv
     if (zz > G) z = q[zz];
     else if (zz < L) z = l[(O + zz) \& lring\_mask];
This code is used in section 71.
73. \langle \text{ Set } y \text{ from register Y 73} \rangle \equiv
     if (yy \ge G) y = g[yy];
     else if (yy < L) y = l[(O + yy) \& lring\_mask];
This code is used in section 71.
74. \langle Set b from register X 74\rangle \equiv
     if (xx > G) b = a[xx]:
     else if (xx < L) b = l[(O + xx) \& lring\_mask];
This code is used in section 71.
75. \langle \text{Local registers } 62 \rangle + \equiv
  register int G, L, O; /* accessible copies of key registers */
76. \langle Global variables 19 \rangle + \equiv
  octa q[256];
                       /* global registers */
                  /* local registers */
                       /* the number of local registers (a power of 2) */
  int lring_mask;
                          /* one less than lring_size */
               /* congruent to rS \gg 3 modulo lring_size */
```

77. Several of the global registers have constant values, because of the way MMIX has been simplified in this simulator.

Special register rN has a constant value identifying the time of compilation. (The macro ABSTIME is defined externally in the file abstime.h, which should have just been created by ABSTIME; ABSTIME is a trivial program that computes the value of the standard library function  $time(\Lambda)$ . We assume that this number, which is the number of seconds in the "UNIX epoch," is less than  $2^{32}$ . Beware: Our assumption will fail in February of 2106.)

```
#define VERSION 1 /* version of the MMIX architecture that we support */
#define SUBVERSION 0 /* secondary byte of version number */
#define SUBSUBVERSION 1 /* further qualification to version number */
```

```
 \begin{array}{l} \langle \operatorname{Initialize\ everything\ } 14 \rangle + \equiv \\ g[rK] = neg\_one; \\ g[rN].h = (\operatorname{VERSION} \ll 24) + (\operatorname{SUBVERSION} \ll 16) + (\operatorname{SUBSUBVERSION} \ll 8); \\ g[rN].l = \operatorname{ABSTIME}; \quad / * \ \operatorname{see\ comment\ and\ warning\ above\ } */ \\ g[rT].h = \#80000005; \\ g[rTT].h = \#80000006; \\ g[rV].h = \#369c2004; \\ \operatorname{if\ } (lring\_size < 256) \ lring\_size = 256; \\ lring\_mask = lring\_size - 1; \\ \operatorname{if\ } (lring\_size\ \&\ lring\_mask) \\ panic("\operatorname{The}\_\operatorname{number}\_\operatorname{of}_{\square}\operatorname{local}_{\square}\operatorname{registers}\_\operatorname{must}_{\square}\operatorname{be}_{\square}\operatorname{a}_{\square}\operatorname{power}_{\square}\operatorname{of}_{\square}2"); \\ l = (\operatorname{octa\ } ) \ calloc(lring\_size\ , \operatorname{sizeof\ (octa)}); \\ \operatorname{if\ } (\neg l) \ panic("\operatorname{No}_{\square}\operatorname{room}_{\square}\operatorname{for}_{\square}\operatorname{the}_{\square}\operatorname{local}_{\square}\operatorname{registers}"); \\ cur\_round = \operatorname{ROUND\_NEAR}; \\ \end{array}
```

**78.** In operations like INCH, we want z to be the yz field, shifted left 48 bits. We also want y to be register X, which has previously been placed in b; then INCH can be simulated as if it were ADDU.

```
\langle \text{ Set } z \text{ as an immediate wyde } 78 \rangle \equiv \begin{cases} \\ \text{ switch } (op \& 3) \end{cases} \{ \\ \text{ case } 0 \colon z.h = yz \ll 16; \text{ break}; \\ \text{ case } 1 \colon z.h = yz; \text{ break}; \\ \text{ case } 2 \colon z.l = yz \ll 16; \text{ break}; \\ \text{ case } 3 \colon z.l = yz; \text{ break}; \\ \} \\ y = b; \end{cases}
```

This code is used in section 71.

**79.**  $\langle$  Set b from special register  $\langle 79 \rangle \equiv b = g[info[op].third\_operand]; This code is used in section 71.$ 

```
 \begin{aligned} op: & \mathbf{register} & \mathbf{mmix\_opcode}, \\ & \S 62. \\ panic &= \mathbf{macro} \; (\;), \; \S 14. \\ rA &= 21, \; \S 55. \\ rB &= 0, \; \S 55. \\ rK &= 15, \; \S 55. \\ rN &= 9, \; \S 55. \\ \mathbf{ROUND\_NEAR} &= 4, \; \S 100. \\ rT &= 13, \; \S 55. \\ rTT &= 14, \; \S 55. \end{aligned}
```

```
rV = 18, §55.

third_operand: unsigned
char, §64.

time: time_t (), <time.h>.
xx: register int, §62.
y: octa, §61.
yy: register int, §62.
yz: register int, §62.
z: octa, §61.
zz: register int, §62.
```

```
80. (Install register X as the destination, adjusting the register stack if necessary 80) \equiv
      if (xx \geq G) {
            sprintf(lhs, "\$\%d=g[\%d]", xx, xx);
            x_ptr = \&q[xx];
      } else {
            while (xx \ge L) (Increase rL 81);
            sprintf(lhs, "\$\%d=1[\%d]", xx, (O+xx) \& lring\_mask);
            x\_ptr = \&l[(O + xx) \& lring\_mask];
This code is used in section 60.
81.
             \langle \text{Increase rL } 81 \rangle \equiv
           l[(O+L) \& lring\_mask] = zero\_octa;
            L = q[rL].l = L + 1;
            if (((S - O - L) \& lring\_mask) \equiv 0) stack\_store();
      }
This code is used in section 80.
               The stack_store routine advances the "gamma" pointer in the ring of local
registers, by storing the oldest local register into memory location rS and advancing
rS.
#define test\_store\_bkpt(ll) if ((ll) \neg bkpt \& write\_bit) breakpoint = tracing = true
\langle \text{Subroutines } 12 \rangle + \equiv
      void stack_store ARGS((void));
      void stack_store()
            register mem_tetra *ll = mem\_find(q[rS]);
            register int k = S \& lring\_mask;
            ll \rightarrow tet = l[k].h; test\_store\_bkpt(ll);
            (ll+1) \rightarrow tet = l[k].l; test\_store\_bkpt(ll+1);
            if (stack_tracing) {
                  tracing = true;
                  if (cur_line) show_line();
                  printf("_{$\sqcup\sqcup\sqcup\sqcup\sqcup\sqcup\sqcup\sqcup\sqcup\sqcup\sqcup\sqcup\sqcup\sqcup}M8[\#\%08x\%08x]=1[\%d]=\#\%08x\%08x,_{$\sqcup rS+=8\n",g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].h,g[rS].
                               g[rS].l, k, l[k].h, l[k].l);
            g[rS] = incr(g[rS], 8), S ++;
                The stack_load routine is essentially the inverse of stack_store.
#define test\_load\_bkpt(ll) if ((ll) \neg bkpt \& read\_bit) breakpoint = tracing = true
\langle \text{Subroutines } 12 \rangle + \equiv
      void stack_load ARGS((void));
      void stack_load()
            register mem_tetra *ll;
            register int k;
```

```
\begin{split} S--,g[rS] &= incr(g[rS],-8);\\ ll &= mem\_find(g[rS]);\\ k &= S \& lring\_mask;\\ l[k].h &= ll^{\rightarrow}tet; \ test\_load\_bkpt(ll);\\ l[k].l &= (ll+1)^{\rightarrow}tet; \ test\_load\_bkpt(ll+1);\\ \textbf{if} \ (stack\_tracing) \ \{\\ tracing &= true;\\ \textbf{if} \ (cur\_line) \ show\_line();\\ printf("_{UUUUUUUUUUTS}-=8,_U1[%d]=M8[#%08x%08x]=#%08x%08x^n",k,g[rS].h,\\ g[rS].l,l[k].h,l[k].l);\\ \}\\ \} \end{split}
```

```
ARGS = macro (), §11.
bkpt: unsigned char, §16.
breakpoint: bool, §61.
cur_line: int, §31.
g: octa [], §76.
G: register int, §75.
h: tetra, §10.
incr: octa (), MMIX-ARITH §6.
l: octa *, §76.
L: register int, §75.
l: tetra, §10.
```

```
\begin{array}{lll} \textit{lhs}: & \textbf{char} \ [], \ \S 139. \\ \textit{lring\_mask}: & \textbf{int}, \ \S 76. \\ \textit{mem\_find}: & \textbf{mem\_tetra} * (), \\ \S 20. & \textbf{mem\_tetra} = \textbf{struct}, \ \S 16. \\ \textit{O}: & \textbf{register} & \textbf{int}, \ \S 75. \\ \textit{printf}: & \textbf{int} \ (), \ \texttt{stdio.h} \texttt{>}. \\ \textit{read\_bit} = \texttt{macro}, \ \S 58. \\ \textit{rL} = 20, \ \S 55. \\ \textit{rS} = 11, \ \S 55. \\ \textit{S}: & \textbf{int}, \ \S 76. \\ \end{array}
```

show\_line: void (), §47.
sprintf: int (), <stdio.h>.
stack\_tracing: bool, §61.
tet: tetra, §16.
tracing: bool, §61.
true = 1, §9.
write\_bit = macro, §58.
x\_ptr: octa \*, §61.
xx: register int, §62.
zero\_octa: octa,
MMIX-ARITH §4.

**84. Simulating the instructions.** The master switch branches in 256 directions, one for each MMIX instruction.

Let's start with ADD, since it is somehow the most typical case—not too easy, and not too hard. The task is to compute x = y + z, and to signal overflow if the sum is out of range. Overflow occurs if and only if y and z have the same sign but the sum has a different sign.

Overflow is one of the eight arithmetic exceptions. We record such exceptions in a variable called *exc*, which is set to zero at the beginning of each cycle and used to update rA at the end.

The main control routine has put the input operands into octabytes y and z. It has also made x-ptr point to the octabyte where the result should be placed.

```
\langle Cases for individual MMIX instructions 84\rangle \equiv
case ADD: case ADDI: x = w;
                                   /* w = oplus(y, z) */
  if (((y.h \oplus z.h) \& sign\_bit) \equiv 0 \land ((y.h \oplus x.h) \& sign\_bit) \neq 0) exc |= V\_BIT;
store_x: *x_ptr = x; break;
See also sections 85, 86, 87, 88, 89, 90, 92, 93, 94, 95, 96, 97, 101, 102, 104, 106, 107, 108, and 124.
This code is used in section 60.
      Other cases of signed and unsigned addition and subtraction are, of course,
similar. Overflow occurs in the calculation x = y - z if and only if it occurs in the
calculation y = x + z.
⟨ Cases for individual MMIX instructions 84⟩ +≡
case SUB: case SUBI: case NEG: case NEGI: x = ominus(y, z);
  if (((x.h \oplus z.h) \& sign\_bit) \equiv 0 \land ((x.h \oplus y.h) \& sign\_bit) \neq 0) exc |= V\_BIT;
  goto store_x;
case ADDU: case ADDUI: case INCH: case INCMH: case INCML: case INCL: x=w;
  goto store_x;
case SUBU: case SUBUI: case NEGUI: x = ominus(y, z); goto store\_x;
case IIADDU: case IIADDUI: case IVADDU: case IVADDUI: case VIIIADDU:
  case VIIIADDUI: case XVIADDU: case XVIADDUI:
  x = oplus(shift\_left(y, ((op \& \#f) \gg 1) - 3), z); goto store\_x;
case SETH: case SETMH: case SETML: case SETL: case GETA: case GETAB: x=z;
  goto store_x;
86. Let's get the simple bitwise operations out of the way too.
⟨ Cases for individual MMIX instructions 84⟩ +≡
case OR: case ORI: case ORH: case ORMH: case ORML: case ORL: x.h = y.h \mid z.h;
  x.l = y.l \mid z.l; goto store_x;
case ORN: case ORNI: x.h = y.h \mid \sim z.h; x.l = y.l \mid \sim z.l; goto store\_x;
case NOR: case NORI: x.h = \sim(y.h \mid z.h); x.l = \sim(y.l \mid z.l); goto store_x;
case XOR: case XORI: x.h = y.h \oplus z.h; x.l = y.l \oplus z.l; goto store\_x;
case AND: case ANDI: x.h = y.h \& z.h; x.l = y.l \& z.l; goto store\_x;
case ANDN: case ANDNI: case ANDNH: case ANDNML: case ANDNML: case ANDNL:
  x.h = y.h \& \sim z.h; \ x.l = y.l \& \sim z.l; \ \mathbf{goto} \ store\_x;
case NAND: case NANDI: x.h = \sim (y.h \& z.h); x.l = \sim (y.l \& z.l); goto store\_x;
```

case NXOR: case NXORI:  $x.h = \sim (y.h \oplus z.h); x.l = \sim (y.l \oplus z.l);$  goto  $store\_x;$ 

```
ADD = #20, \S 54.
                                          l: tetra, §10.
                                                                                    SETH = \#e0, \S 54.
                                                                                  SETL = {}^{\#}e3, \S 54.
ADDI = ^{\#}21, §54.
                                          NAND = {}^{\#}cc, \S 54.
ADDU = ^{\#}22, §54.
                                         \mathtt{NANDI} = {}^{\#}\mathtt{cd}, \S 54.
                                                                                   SETMH = \#e1, \S 54.
ADDUI = ^{\#}23, \S54.
                                        NEG = {}^{\#}34, \S 54.
                                                                                  SETML = {}^{\#}e2, \S 54.
AND = ^{\#}c8, §54.
                                        NEGI = #35, §54.
                                                                                  shift_left: octa (),
ANDI = ^{\#} c9, §54.
                                        NEGU = #36, §54.
                                                                                      MMIX-ARITH §7.
\mathtt{ANDN} = {^\#\mathtt{ca}}, \ \S{54}.
                                        NEGUI = #37, \S54.
                                                                                  sign\_bit = macro, \S 15.
                                        NOR = {}^{\#}c4, \S 54.
                                                                                    SUB = #24, \S54.
\mathtt{ANDNH} = ^{\#}\mathtt{ec}, \ \S 54.
ANDNI = \# cb, \S54.
                                        NORI = {}^{\#}c5, \S 54.
                                                                                    SUBI = #25, \S54.
                                                                                    SUBU = #26, §54.
\mathtt{ANDNL} = ^{\#}\mathtt{ef}, \ \S 54.
                                        NXOR = ^{\#}ce, \S 54.
\mathtt{ANDNMH} = ^{\#}\mathtt{ed}, \S 54.
                                        \texttt{NXORI} = {}^{\#}\mathtt{cf}, \S 54.
                                                                                    SUBUI = #27, \S54.
\mathtt{ANDNML} = ^{\#}\mathtt{ee}, \, \S 54.
                                        ominus: octa (),
                                                                                    V_BIT = macro, \S 57.
                                                                                    VIIIADDU = {}^{\#}2c, \S 54.
exc: int, §61.
                                          MMIX-ARITH §5.
GETA = {}^{\#}f4, \S 54.
                                        op: register mmix_opcode,
                                                                                    VIIIADDUI = #2d, §54.
GETAB = #f5, §54.
                                                                                     w: octa, §61.
h: tetra, §10.
                                         oplus: octa (), MMIX-ARITH §5.
                                                                                    x: octa, §61.
                                                                                     x\_ptr: \mathbf{octa} *, \S 61.
IIADDU = *28, §54.
                                          OR = {}^{\#}cO, \S 54.
IIADDUI = #29, §54.
                                          ORH = \#e8, \S 54.
                                                                                    XOR = {}^{\#}c6, \S 54.
INCH = \# e4, §54.
                                         ORI = {}^{\#}c1, \S 54.
                                                                                    XORI = {}^{\#}c7, \S 54.
                                         ORL = {}^{\#}eb, \S 54.
INCL = {}^{\#}e7, \S 54.
                                                                                    XVIADDU = #2e, §54.
INCMH = ^{\#}e5, §54.
                                         ORMH = \#e9, \S 54.
                                                                                    XVIADDUI = #2f, §54.
INCML = \#e6, \S 54.
                                         ORML = \#ea, \S 54.
                                                                                   y: octa, §61.
IVADDU = #2a, §54.
                                         ORN = {}^{\#}c2, \S 54.
                                                                                    z: octa, §61.
IVADDUI = \#2b, \S54.
                                         ORNI = {}^{\#}c3, \S 54.
```

87. The less simple bit manipulations are almost equally simple, given the subroutines of MMIX-ARITH. The MUX operation has three inputs; in such cases the inputs appear in y, z, and b.

```
#define shift\_amt (z.h \lor z.l \ge 64 ? 64 : z.l)
⟨ Cases for individual MMIX instructions 84⟩ +≡
case SL: case SLI: x = shift\_left(y, shift\_amt);
  a = shift\_right(x, shift\_amt, 0);
  if (a.h \neq y.h \lor a.l \neq y.l) exc |= V_BIT;
  goto store_x;
case SLU: case SLUI: x = shift\_left(y, shift\_amt); goto store\_x;
case SR: case SRI: case SRU: case SRUI: x = shift\_right(y, shift\_amt, op \& #2);
  goto store_x:
case MUX: case MUXI: x.h = (y.h \& b.h) | (z.h \& \sim b.h); x.l = (y.l \& b.l) | (z.l \& \sim b.l);
  goto store_x:
case SADD: case SADDI: x.l = count\_bits(y.h \& \sim z.h) + count\_bits(y.l \& \sim z.l); goto store\_x;
case MOR: case MORI: x = bool\_mult(y, z, false); goto store\_x;
case MXOR: case MXORI: x = bool\_mult(y, z, true); goto store\_x;
case BDIF: case BDIFI: x.h = byte\_diff(y.h, z.h); x.l = byte\_diff(y.l, z.l); goto store\_x;
case WDIF: case WDIFI: x.h = wyde\_diff(y.h, z.h); x.l = wyde\_diff(y.l, z.l); goto store\_x;
case TDIF: case TDIFI: if (y.h > z.h) x.h = y.h - z.h;
tdif_{-}l: if (y.l > z.l) x.l = y.l - z.l; goto store_{-}x;
case ODIF: case ODIFI: if (y.h > z.h) x = ominus(y, z);
  else if (y.h \equiv z.h) goto tdif_l;
  goto store\_x;
```

**88.** When an operation has two outputs, the primary output is placed in x and the auxiliary output is placed in a.

```
 \begin{array}{l} \langle \operatorname{Cases} \ \operatorname{for} \ \operatorname{individual} \ \operatorname{MMIX} \ \operatorname{instructions} \ 84 \rangle + \equiv \\ \operatorname{case} \ \operatorname{MULI} : \ x = signed\_omult(y,z); \\ test\_overflow : \ \operatorname{if} \ (overflow) \ exc \ |= \operatorname{V\_BIT}; \\ \operatorname{goto} \ store\_x; \\ \operatorname{case} \ \operatorname{MULUI} : \ x = omult(y,z); \ a = g[rH] = aux; \ \operatorname{goto} \ store\_x; \\ \operatorname{case} \ \operatorname{DIVI} : \ \operatorname{if} \ (\neg z.l \wedge \neg z.h) \ aux = y, exc \ |= \operatorname{D\_BIT}, overflow = false; \\ \operatorname{else} \ x = signed\_odiv(y,z); \\ a = g[rR] = aux; \ \operatorname{goto} \ test\_overflow; \\ \operatorname{case} \ \operatorname{DIVU} : \ \operatorname{case} \ \operatorname{DIVUI} : \ x = odiv(b,y,z); \ a = g[rR] = aux; \ \operatorname{goto} \ store\_x; \\ \end{array}
```

**89.** The floating point routines of MMIX-ARITH record exceptional events in a variable called *exceptions*. Here we simply merge those bits into the *exc* variable. The U\_BIT is not exactly the same as "underflow," but the true definition of underflow will be applied when *exc* is combined with rA.

```
 \begin{array}{l} \langle \text{ Cases for individual MMIX instructions } 84 \rangle + \equiv \\ \textbf{case FADD: } x = fplus(y,z); \\ \textit{fin\_float: } round\_mode = \textit{cur\_round}; \\ \textit{store\_fx: } exc \mid = exceptions; \ \textbf{goto } store\_x; \\ \textbf{case FSUB: } a = z; \ \textbf{if } (fcomp(a,zero\_octa) \neq 2) \ a.h \oplus = sign\_bit; \\ x = fplus(y,a); \ \textbf{goto } fin\_float; \\ \textbf{case FMUL: } x = fmult(y,z); \ \textbf{goto } fin\_float; \\ \end{array}
```

```
case FDIV: x = fdivide(y, z); goto fin\_float; case FREM: x = fremstep(y, z, 2500); goto fin\_float; case FSQRT: x = froot(z, y.l); fin\_unifloat: if (y.h \lor y.l > 4) goto illegal\_inst; round\_mode = (y.l ? y.l : cur\_round); goto store\_fx; case FINT: x = fintegerize(z, y.l); goto fin\_unifloat; case FIX: x = fixit(z, y.l); goto fin\_unifloat; case FIXU: x = fixit(z, y.l); exceptions \&= \sim W\_BIT; goto fin\_unifloat; case FLOTU: case FLOTU: case SFLOTU: case SFLOTU: case SFLOTU: x = floatit(z, y.l, op \& #2, op \& #4); goto fin\_unifloat;
```

```
SADD = \#da, \S 54.
a: octa, §61.
                                    fmult: octa (),
aux: octa, MMIX-ARITH §4.
                                      MMIX-ARITH §41.
                                                                         SADDI = \#db, \S 54.
b: octa, §61.
                                    fplus: octa (),
                                                                         SFLOT = {}^{\#}Oc, \S 54.
BDIF = ^{\#}d0, §54.
                                     MMIX-ARITH §46.
                                                                        SFLOTI = \#Od, §54.
                                                                        SFLOTU = ^{\#}Oe, §54.
BDIFI = ^{\#}d1, §54.
                                    FREM = ^{\#}16, §54.
bool_mult: octa (),
                                    fremstep: octa (),
                                                                        SFLOTUI = {}^{\#}Of, \S 54.
                                                                        shift_left: octa (),
  MMIX-ARITH §29.
                                      mmix-arith §93.
byte_diff: tetra (),
                                    froot: octa (),
                                                                          MMIX-ARITH §7.
 MMIX-ARITH §27.
                                     MMIX-ARITH §91.
                                                                        shift_right: octa (),
                                    FSQRT = #15, §54.
count_bits: int (),
                                                                          MMIX-ARITH §7.
                                   FSUB = ^{\#}06, \S 54.
                                                                        sign\_bit = macro, \S 15.
 MMIX-ARITH §26.
                                    g: octa [], §76.
                                                                        signed_odiv: octa (),
cur_round: int,
 MMIX-ARITH §30.
                                    h: tetra, §10.
                                                                          MMIX-ARITH §24.
D_BIT = macro, \S 57.
                                   illegal_inst: label, §107.
                                                                        signed_omult: octa (),
DIV = #1c, §54.
                                   l: tetra, §10.
                                                                          MMIX-ARITH §12.
DIVI = #1d, §54.
                                   MOR = {}^{\#}dc, \S 54.
                                                                        SL = #38, \S 54.
\mathtt{DIVU}=^{\#}\mathtt{1e},\,\S54.
                                   MORI = {}^{\#}dd, \S 54.
                                                                        SLI = #39, \S54.
                                                                        SLU = {}^{\#}3a, \S 54.
DIVUI = #1f, §54.
                                   MUL = #18, §54.
                                   MULI = #19, §54.
                                                                       SLUI = {}^{\#}3b, \S 54.
exc: int, §61.
                                    MULU = #1a, §54.
                                                                        SR = {}^{\#}3c, \S 54.
exceptions: int,
                                                                        SRI = {}^{\#}3d, \S 54.
                                    MULUI = #1b, §54.
  MMIX-ARITH §32.
                                    MUX = {}^{\#}d8, \S 54.
                                                                        SRU = {}^{\#}3e, \S 54.
FADD = ^{\#}04, \S 54.
                                    MUXI = {}^{\#}d9, \S 54.
                                                                        SRUI = #3f, §54.
false = 0, \S 9.
fcomp: int (), MMIX-ARITH §85. MXOR = \#de, §54.
                                                                        store_x: label, §84.
FDIV = #14, §54.
                                    \texttt{MXORI} = ^{\#} \texttt{df}, \S 54.
                                                                        TDIF = \# d4, §54.
                                    ODIF = ^{\#}d6, §54.
fdivide: octa (),
                                                                        TDIFI = \#d5, \S 54.
  MMIX-ARITH §44.
                                    ODIFI = \# d7, §54.
                                                                         true = 1, \S 9.
FINT = #17, §54.
                                    odiv: octa (), MMIX-ARITH §13. U_BIT = macro, §57.
fintegerize: octa (),
                                    ominus: octa (),
                                                                         V_BIT = macro, §57.
  MMIX-ARITH §86.
                                      MMIX-ARITH §5.
                                                                        W_BIT = macro, §57.
FIX = {}^{\#}05, \S 54.
                                                                        WDIF = ^{\#}d2, §54.
                                    omult: octa (),
fixit: octa (), MMIX-ARITH §88.
                                     MMIX-ARITH §8.
                                                                        WDIFI = ^{\#}d3, §54.
FIXU = ^{\#}07, \S 54.
                                    op: register mmix_opcode,
                                                                         wyde\_diff: \mathbf{tetra} (),
floatit: octa (),
                                      §62.
                                                                          MMIX-ARITH §28.
  MMIX-ARITH §89.
                                    overflow: bool,
                                                                        x: octa, §61.
FLOT = #08, §54.
                                      MMIX-ARITH §4.
                                                                        y: octa, §61.
FLOTI = #09, \S 54.
                                    rH = 3, \S 55.
                                                                        z: octa, §61.
FLOTU = #0a, §54.
                                    round_mode: int, §61.
                                                                        zero_octa: octa,
FLOTUI = ^{\#}0b, §54.
                                    rR = 6, \S 55.
                                                                          MMIX-ARITH §4.
FMUL = #10, §54.
```

**90.** We have now done all of the arithmetic operations except for the cases that compare two registers and yield a value of -1 or 0 or 1.

```
/* x is 0 by default */
\#define cmp\_zero store\_x
\langle Cases for individual MMIX instructions 84\rangle +=
case CMP: case CMPI: if ((y.h \& sign\_bit) > (z.h \& sign\_bit)) goto cmp\_neg;
  if ((y.h \& sign\_bit) < (z.h \& sign\_bit)) goto cmp\_pos;
case CMPU: case CMPUI: if (y.h < z.h) goto cmp\_neg;
  if (y.h > z.h) goto cmp\_pos;
  if (y.l < z.l) goto cmp\_neq;
  if (y.l \equiv z.l) goto cmp\_zero;
cmp\_pos: x.l = 1;  goto store\_x;
cmp\_neq: x = neq\_one; goto store\_x;
case FCMPE: k = fepscomp(y, z, b, true);
  if (k) goto cmp_zero_or_invalid;
case FCMP: k = fcomp(y, z);
  if (k < 0) goto cmp\_neg;
cmp\_fin: if (k \equiv 1) goto cmp\_pos;
cmp\_zero\_or\_invalid: if (k \equiv 2) exc = I\_BIT;
  goto cmp_zero:
case FUN: if (fcomp(y, z) \equiv 2) goto cmp\_pos; else goto cmp\_zero;
case FEQL: if (fcomp(y, z) \equiv 0) goto cmp\_pos; else goto cmp\_zero;
case FEQLE: k = fepscomp(y, z, b, false);
  goto cmp_fin;
case FUNE: if (fepscomp(y, z, b, true) \equiv 2) goto cmp\_pos; else goto cmp\_zero;
```

**91.** We have now done all the register-register operations except for the conditional commands. Conditional commands and branch commands all make use of a simple subroutine that determines whether a given octabyte satisfies the condition of a given opcode.

```
\langle Subroutines 12\rangle + \equiv
  int register_truth ARGS((octa, mmix_opcode));
  int register_truth(o, op)
       octa o:
       mmix_opcode op;
  \{ \text{ register int } b; 
     switch ((op \gg 1) \& #3) {
     case 0: b = o.h \gg 31; break;
                                       /* negative? */
     case 1: b = (o.h \equiv 0 \land o.l \equiv 0); break;
                                                  /* zero? */
     case 2: b = (o.h < sign\_bit \land (o.h \lor o.l)); break;
                                                             /* positive? */
     case 3: b = o.l \& #1; break; /* odd? */
    if (op \& #8) return b \oplus 1:
     else return b;
```

**92.** The b operand will be zero on the ZS operations; it will be the contents of register X on the CS operations.

```
ARGS = macro(), \S 11.
                                     exc: int, §61.
                                                                           store_x: label, §84.
b: octa, §61.
                                     false = 0, \S 9.
                                                                           true = 1, \S 9.
CMP = #30, §54.
                                     FCMP = {}^{\#}01, \S 54.
                                                                           x: octa, §61.
CMPI = #31, §54.
                                     FCMPE = #11, §54.
                                                                           y: octa, §61.
CMPU = #32, §54.
                                     fcomp: int (), MMIX-ARITH §85. z: octa, §61.
CMPUI = #33, §54.
                                                                           ZSEV = \#7e, §54.
                                     fepscomp: int (),
\mathtt{CSEV}=^{\#}\mathtt{6e},\,\S 54.
                                                                           ZSEVI = #7f, \S 54.
                                       MMIX-ARITH §50.
CSEVI = #6f, §54.
                                     FEQL = #03, §54.
                                                                           ZSN = #70, \S54.
CSN = \#60, \S54.
                                     FEQLE = \# 13, §54.
                                                                           ZSNI = #71, \S 54.
                                     FUN = \#02, \S54.
CSNI = #61, \S 54.
                                                                           ZSNN = #78, \S 54.
CSNN = {}^{\#}68, \S 54.
                                     FUNE = ^{\#}12, §54.
                                                                           ZSNNI = #79, \S 54.
CSNNI = #69, §54.
                                     h: tetra, §10.
                                                                           ZSNP = {}^{\#}7c, \S 54.
CSNP = {}^{\#}6c, §54.
                                     I_BIT = macro, \S 57.
                                                                           ZSNPI = #7d, \S 54.
CSNPI = {}^{\#}6d, \S 54.
                                     k: register int, §62.
                                                                           ZSNZ = #7a, §54.
CSNZ = {}^{\#}6a, \S 54.
                                                                           ZSNZI = {}^{\#}7b, \S 54.
                                     l: tetra, §10.
                                     mmix\_opcode = enum, §54.
CSNZI = #6b, §54.
                                                                           ZSOD = #76, \S54.
CSOD = \#66, \S54.
                                     neq_one: octa, MMIX-ARITH §4.
                                                                           ZSODI = #77, \S 54.
                                                                           ZSP = #74, \S 54.
CSODI = #67, \S 54.
                                     octa = struct, \S 10.
CSP = \#64, \S54.
                                                                           ZSPI = #75, \S 54.
                                     op: register mmix_opcode,
CSPI = #65, §54.
                                                                           ZSZ = #72, \S54.
CSZ = #62, \S54.
                                     sign\_bit = macro, \S 15.
                                                                           ZSZI = #73, \S 54.
CSZI = #63, \S 54.
```

 $ll = mem\_find(w);$ 

**goto** check\_ld;

 $test\_load\_bkpt(ll); test\_load\_bkpt(ll+1);$ 

case LDSF: case LDSFI:  $ll = mem\_find(w)$ ;  $test\_load\_bkpt(ll)$ ;

 $x.h = ll \rightarrow tet; \ x.l = (ll + 1) \rightarrow tet;$ 

 $x = load\_sf(ll \rightarrow tet);$  **goto**  $check\_ld;$ 

**93.** Didn't that feel good, when 32 opcodes reduced to a single case? We get to do it one more time. Happiness!

```
\langle Cases for individual MMIX instructions 84\rangle + \equiv
case BN: case BNB: case BZ: case BZB:
  case BP: case BPB: case BOD: case BODB:
  case BNN: case BNNB: case BNZ: case BNZB:
  case BNP: case BNPB: case BEV: case BEVB:
  case PBN: case PBNB: case PBZ: case PBZB:
  case PBP: case PBPB: case PBOD: case PBODB:
  case PBNN: case PBNNB: case PBNZ: case PBNZB:
  case PBNP: case PBNPB: case PBEV: case PBEVB:
  x.l = register\_truth(b, op);
  if (x.l) {
    inst\_ptr = z;
     good = (op > PBN);
  } else good = (op < PBN);
  if (good) good\_guesses ++;
  else {
     bad\_guesses ++, sclock.l += 2; /* penalty is 2v for bad guess */
    if (q[rI].l < 2 \land q[rI].l \land q[rI].h \equiv 0) tracing = breakpoint = true;
    g[rI] = incr(g[rI], -2);
  break:
94. Memory operations are next on our agenda. The memory address, y+z, has
already been placed in w.
⟨ Cases for individual MMIX instructions 84⟩ +≡
case LDB: case LDBI: case LDBU: case LDBUI:
  i = 56; \ j = (w.l \& ^{\#}3) \ll 3;
  goto fin_ld:
case LDW: case LDWI: case LDWU: case LDWUI:
  i = 48; \ j = (w.l \& ^{\#}2) \ll 3;
  goto fin_ld;
case LDT: case LDTU: case LDTUI:
  i = 32; j = 0; goto fin\_ld;
case LDHT: case LDHTI: i = j = 0;
fin\_ld: ll = mem\_find(w); test\_load\_bkpt(ll);
  x.h = ll \rightarrow tet;
  x = shift\_right(shift\_left(x, j), i, op \& #2);
check\_ld: if (w.h \& sign\_bit) goto privileged\_inst;
  goto store_x:
case LDO: case LDOI: case LDOUI: case LDUNC: case LDUNC: w.l \& = -8;
```

```
b: octa, §61.
                                     LDBUI = \#83, \S54.
                                                                           PBNN = #58, §54.
                                     LDHT = ^{\#}92, §54.
                                                                           PBNNB = #59, §54.
bad\_guesses: int, §139.
                                                                           PBNP = ^{\#}5c, \S54.
BEV = ^{\#}4e, §54.
                                     LDHTI = #93, §54.
BEVB = #4f, §54.
                                     LDO = {}^{\#}8c, \S 54.
                                                                           PBNPB = #5d, §54.
                                                                           PBNZ = #5a, §54.
BN = #40, \S 54.
                                    LDOI = *8d, \S 54.
                                    {\tt LDOU} = {\tt \#8e}, \, \S 54.
BNB = #41, §54.
                                                                           PBNZB = #5b, §54.
\mathtt{BNN} = ^{\#}48, \, \S54.
                                                                           PBOD = #56, §54.
                                    LDOUI = \#8f, \S54.
BNNB = #49, §54.
                                    LDSF = \#90, \S54.
                                                                           PBODB = ^{\#}57, §54.
BNP = {}^{\#}4c, \S 54.
                                    LDSFI = ^{\#}91, §54.
                                                                           PBP = #54, \S 54.
BNPB = {}^{\#}4d, \S 54.
                                     LDT = *88, \S54.
                                                                           PBPB = #55, §54.
{\tt BNZ} = {\tt \#4a}, \, \S 54.
                                     LDTI = #89, \S 54.
                                                                           PBZ = #52, \S54.
                                     LDTU = {}^{\#}8a, \S 54.
BNZB = {}^{\#}4b, \S 54.
                                                                           PBZB = #53, \S 54.
BOD = #46, §54.
                                     LDTUI = \#8b, \S54.
                                                                           privileged_inst: label, §107.
                                     LDUNC = \#96, \S54.
BODB = \#47, §54.
                                                                           register_truth: int (), §91.
{\tt BP}={^\#44},\ \S 54.
                                     LDUNCI = #97, §54.
                                                                           rI = 12, \S 55.
BPB = #45, \S 54.
                                     LDW = #84, \S 54.
                                                                           sclock: octa, §19.
                                     LDWI = \#85, \S54.
breakpoint: bool, §61.
                                                                           shift_left: octa (),
BZ = #42, \S 54.
                                     LDWU = #86, \S 54.
                                                                             MMIX-ARITH §7.
BZB = #43, \S 54.
                                     LDWUI = \#87, §54.
                                                                           shift_right: octa (),
g: octa [], §76.
                                      ll: register mem_tetra *.
                                                                             MMIX-ARITH §7.
good: bool, §61.
                                                                           sign\_bit = macro, §15.
                                        §62.
                                                                           store\_x: label, §84.
good_guesses: int, §139.
                                      load_sf: octa (),
h: tetra, §10.
                                        MMIX-ARITH §39.
                                                                            test\_load\_bkpt = macro(), \S 83.
i: register int, §62.
                                      mem\_find: mem\_tetra *(),
                                                                            tet: tetra, §16.
incr: octa (), MMIX-ARITH §6.
                                        §20.
                                                                            tracing: bool, \S 61.
inst\_ptr: octa, §61.
                                      op: register mmix_opcode,
                                                                           true = 1, \S 9.
                                                                           w: octa, §61.
j: register int, §62.
                                        §62.
                                      PBEV = ^{\#}5e, \S54.
l: tetra, §10.
                                                                           x: octa, §61.
LDB = #80, \S54.
                                     PBEVB = \#5f, §54.
                                                                           y: octa, §61.
LDBI = #81, §54.
                                     PBN = #50, \S54.
                                                                           z: octa, §61.
LDBU = \#82, §54.
                                     PBNB = #51, §54.
```

```
\langle Cases for individual MMIX instructions 84\rangle + \equiv
case STB: case STBI: case STBU: case STBUI:
  i = 56; j = (w.l \& #3) \ll 3;
  goto fin_vst:
case STW: case STWI: case STWU: case STWUI:
  i = 48; \ j = (w.l \& ^{\#}2) \ll 3;
  goto fin_pst;
case STT: case STTU: case STTUI:
  i = 32; j = 0;
fin\_pst: ll = mem\_find(w);
  if ((op \& #2) \equiv 0) {
     a = shift\_right(shift\_left(b, i), i, 0);
     if (a.h \neq b.h \lor a.l \neq b.l) exc |= V_BIT;
  ll \rightarrow tet \oplus = (ll \rightarrow tet \oplus (b.l \ll (i-32-j))) \& (((\mathbf{tetra}) - 1) \ll (i-32)) \gg j);
  goto fin_st;
case STSF: case STSFI: ll = mem\_find(w);
  ll \rightarrow tet = store\_sf(b); exc = exceptions;
  goto fin_st:
case STHT: case STHTI: ll = mem\_find(w); ll \rightarrow tet = b.h;
fin\_st: test\_store\_bkpt(ll);
  w.l \&= -8; ll = mem\_find(w);
  a.h = ll \rightarrow tet; \ a.l = (ll + 1) \rightarrow tet;
                                         /* for trace output */
  goto check_st;
case STCO: case STCOI: b.l = xx;
case STO: case STOI: case STOUI: case STUNC: case STUNCI: w.l \& = -8;
  ll = mem\_find(w);
  test\_store\_bkpt(ll); test\_store\_bkpt(ll+1);
  ll \rightarrow tet = b.h; (ll + 1) \rightarrow tet = b.l;
check\_st: if (w.h \& sign\_bit) goto privileged\_inst;
  break:
96.
       The CSWAP operation has elements of both loading and storing. We shuffle some
of the operands around so that they will appear correctly in the trace output.
⟨ Cases for individual MMIX instructions 84⟩ +≡
case CSWAPI: w.l \&= -8; ll = mem\_find(w);
  test\_load\_bkpt(ll); test\_load\_bkpt(ll+1);
  a = q[rP];
  if (ll \rightarrow tet \equiv a.h \land (ll + 1) \rightarrow tet \equiv a.l) {
     x.h = 0, x.l = 1;
     test\_store\_bkpt(ll); test\_store\_bkpt(ll+1);
     ll \rightarrow tet = b.h, (ll + 1) \rightarrow tet = b.l;
     strcpy(rhs, "M8[\%#w]=\%#b");
  } else {
     b.h = ll \rightarrow tet, b.l = (ll + 1) \rightarrow tet;
     q[rP] = b;
     strcpy(rhs, "rP=%#b");
  goto check_ld;
```

```
The GET command is permissive, but PUT is restrictive.
⟨ Cases for individual MMIX instructions 84⟩ +≡
case GET: if (yy \neq 0 \lor zz > 32) goto illegal_inst;
  x = q[zz];
  goto store_x:
case PUT: case PUTI: if (yy \neq 0 \lor xx \geq 32) goto illegal_inst;
  strcpy(rhs, "%z_{||}=||%#z");
  if (xx > 8) {
     if (xx < 11 \land xx \neq 8) goto illegal_inst;
                                                       /* can't change rN, rO, rS */
     if (xx \le 18) goto privileged\_inst;
     if (xx \equiv rA) (Get ready to update rA 100)
     else if (xx \equiv rL) \langle \text{Set } L = z = \min(z, L) \text{ 98} \rangle
     else if (xx \equiv rG) (Get ready to update rG 99);
  g[xx] = z; zz = xx; break;
98. \langle \operatorname{Set} L = z = \min(z, L) | 98 \rangle \equiv
     x = z; strcpy(rhs, z.h ? "min(rL, %#x) = %z" : "min(rL, %x) = %z");
     if (z.l > L \lor z.h) z.h = 0, z.l = L;
     else old_{-}L = L = z.l;
```

This code is used in section 97.

```
PUTI = #f7, §54.
                                                                           STOUI = \#af, \S 54.
a: octa, §61.
                                     rA = 21, \S 55.
                                                                           strcpy: char *(), <string.h>.
b: octa, §61.
                                                                           STSF = {}^{\#}b0, \S 54.
check_ld: label, §94.
                                     rG = 19, \S 55.
CSWAP = #94, \S54.
                                                                           STSFI = {}^{\#}b1, \S 54.
                                     rhs = macro, \S 139.
CSWAPI = #95, §54.
                                                                           STT = {}^{\#}a8, \S 54.
                                     rL = 20, \S 55.
                                                                           STTI = {}^{\#}a9, \S 54.
exc: int, §61.
                                     rP = 23, \S 55.
                                                                           STTU = \#aa, \S 54.
exceptions: int,
                                     shift_left: octa (),
                                                                           STTUI = #ab, §54.
 MMIX-ARITH §32.
                                       MMIX-ARITH §7.
                                                                           STUNC = \#b6, §54.
g: octa [], §76.
                                     shift_right: octa (),
GET = {}^{\#}fe, \S 54.
                                                                           STUNCI = \#b7, §54.
                                       MMIX-ARITH §7.
                                                                           STW = \#a4, \S 54.
h: tetra, §10.
                                      sign\_bit = macro, \S 15.
i: register int, §62.
                                     STB = \#a0, \S 54.
                                                                           STWI = \#a5, \S 54.
                                                                           STWU = \#a6, \S 54.
illegal_inst: label, §107.
                                     STBI = \#a1, \S 54.
                                     STBU = {}^{\#}a2, \S 54.
j: register int, §62.
                                                                           STWUI = {}^{\#}a7, \S 54.
l: tetra, §10.
                                     STBUI = \# a3, §54.
                                                                           test\_load\_bkpt = macro(), \S 83.
                                     STCO = {}^{\#}b4, \S 54.
L: register int, \S75.
                                                                           test\_store\_bkpt = macro(), \S82.
ll: register mem_tetra *,
                                     STCOI = {}^{\#}b5, \S 54.
                                                                           tet: tetra, §16.
                                     {\tt STHT}={\tt \#b2}, \, \S 54.
  §62.
                                                                           tetra = unsigned int, §10.
                                     STHTI = #b3, §54.
mem\_find: mem\_tetra *(),
                                                                           V_BIT = macro, \S 57.
  §20.
                                     STO = \#ac, \S 54.
                                                                           w: octa, §61.
                                     \mathtt{STOI} = ^{\#}\mathtt{ad}, \ \S 54.
old_L: int, §61.
                                                                           x: octa, §61.
op: register mmix_opcode,
                                     store_sf: tetra (),
                                                                           xx: register int, §62.
                                                                          yy: register int, §62.
                                        MMIX-ARITH §40.
privileged_inst: label, §107.
                                      store_x: label, §84.
                                                                           z: octa, §61.
PUT = #f6, §54.
                                     STOU = \#ae, \S 54.
                                                                           zz: register int, §62.
```

```
99. \langle \text{ Get ready to update rG 99} \rangle \equiv
    if (z.h \neq 0 \lor z.l > 255 \lor z.l < L \lor z.l < 32) goto illegal_inst;
     for (j = z.l; j < G; j++) q[j] = zero\_octa;
    G=z.l:
This code is used in section 97.
100.
        #define ROUND_OFF 1
#define ROUND_UP 2
#define ROUND_DOWN 3
#define ROUND_NEAR 4
\langle \text{ Get ready to update rA } 100 \rangle \equiv
  {
     if (z.h \neq 0 \lor z.l > \#40000) goto illegal_inst;
     cur\_round = (z.l > \text{#10000 ? } z.l \gg 16 : ROUND\_NEAR);
This code is used in section 97.
        Pushing and popping are rather delicate, because we want to trace them
coherently.
⟨ Cases for individual MMIX instructions 84⟩ +≡
case PUSHGO: case PUSHGOI: inst\_ptr = w; goto push;
case PUSHJ: case PUSHJB: inst\_ptr = z;
push: if (xx > G) {
    xx = L++;
    if (((S - O - L) \& lring\_mask) \equiv 0) stack\_store();
  x.l = xx; l[(O + xx) \& lring\_mask] = x; /* the "hole" records the amount pushed */
  sprintf(lhs, "l[%d]=%d, ", (O + xx) & lring\_mask, xx);
  x = g[rJ] = incr(loc, 4);
  L = xx + 1; O += xx + 1;
  b = q[rO] = incr(q[rO], (xx + 1) \ll 3);
sync_L: a.l = g[rL].l = L; break;
case POP: if (xx \neq 0 \land xx \leq L) y = l[(O + xx - 1) \& lring\_mask];
  if (g[rS].l \equiv g[rO].l) stack_load();
  k = l[(O-1) \& lring\_mask].l \& #ff;
  while ((tetra)(O - S) < (tetra) k) stack\_load();
  L = k + (xx \le L ? xx : L + 1);
  if (L > G) L = G;
  if (L > k) {
    l[(O-1) \& lring\_mask] = y;
    if (y.h) sprintf (lhs, "l[%d] = \#%x\%08x, ", (O-1) & lring_mask, y.h, y.l);
     else sprintf(lhs, "l[%d]=\#%x, ", (O-1) \& lring\_mask, y.l);
  } else lhs[0] = '\0';
  y = g[rJ]; \ z.l = yz \ll 2; \ inst\_ptr = oplus(y, z);
  O = k + 1; b = g[rO] = incr(g[rO], -((k + 1) \ll 3));
  goto sync_L;
```

102. To complete our simulation of MMIX's register stack, we need to implement SAVE and UNSAVE.

```
 \begin{split} &\langle \text{ Cases for individual MMIX instructions } 84 \rangle + \equiv \\ &\text{ case SAVE: if } (xx < G \lor yy \neq 0 \lor zz \neq 0) \text{ goto } illegal\_inst; \\ &l[(O+L) \& lring\_mask].l = L, L++; \\ &\text{ if } (((S-O-L) \& lring\_mask) \equiv 0) \quad stack\_store(); \\ &O+=L; \quad g[rO] = incr(g[rO], L \ll 3); \\ &L=g[rL].l=0; \\ &\text{ while } (g[rO].l \neq g[rS].l) \quad stack\_store(); \\ &\text{ for } (k=G; \;; \;) \; \{ \\ &\langle \text{Store } g[k] \text{ in the register stack } 103 \rangle; \\ &\text{ if } (k\equiv 255) \quad k=rB; \\ &\text{ else if } (k\equiv rR) \quad k=rP; \\ &\text{ else if } (k\equiv rZ+1) \quad \text{ break}; \\ &\text{ else } k++; \\ &\} \\ &O=S, g[rO]=g[rS]; \\ &x=incr(g[rO], -8); \quad \text{goto } store\_x; \end{split}
```

```
SAVE = \#fa, \S 54.
a: octa, §61.
                                   lring\_mask: int, §76.
                                                                      sprintf: int (), <stdio.h>.
b: octa, §61.
                                   O: register int, §75.
cur_round: int,
                                   oplus: octa (), MMIX-ARITH §5. stack_load: void (), §83.
  MMIX-ARITH §30.
                                   POP = #f8, §54.
                                                                      stack_store: void (), §82.
g: octa [], §76.
                                   PUSHGO = \# be, \S54.
                                                                      store_x: label, §84.
                                   PUSHGOI = #bf, §54.
G: register int, §75.
                                                                      tetra = unsigned int, §10.
h: tetra, §10.
                                   PUSHJ = #f2, §54.
                                                                      w: octa, §61.
illegal_inst: label, §107.
                                   PUSHJB = ^{\#}f3, §54.
                                                                      x: octa, §61.
                                   rB = 0, \S 55.
incr: octa (), MMIX-ARITH §6.
                                                                      xx: register int, §62.
                                   rJ=4,\ \S 55.
inst\_ptr: octa, §61.
                                                                      y: octa, §61.
                                   rL = 20, \S 55.
j: register int, §62.
                                                                      yy: register int, §62.
k: register int, §62.
                                   rO = 10, \S 55.
                                                                      yz: register int, \S62.
                                   rP = 23, \S 55.
L: register int, §75.
                                                                      z: octa, §61.
                                   rR = 6, \S 55.
l: tetra, §10.
                                                                      zero_octa: octa,
l: octa *, §76.
                                   rS = 11, \S 55.
                                                                       MMIX-ARITH §4.
lhs: char [], §139.
                                   rZ = 27, \S 55.
                                                                      zz: register int, §62.
loc: octa, §61.
                                   S: int, §76.
```

103. This part of the program naturally has a lot in common with the  $stack\_store$  subroutine. (There's a little white lie in the section name; if k is rZ + 1, we store rG and rA, not q[k].)

```
\langle \text{Store } q[k] \text{ in the register stack } 103 \rangle \equiv
       ll = mem\_find(q[rS]);
       if (k \equiv rZ + 1) x.h = G \ll 24, x.l = g[rA].l;
       else x = q[k];
       ll \rightarrow tet = x.h; test\_store\_bkpt(ll);
       (ll+1) \rightarrow tet = x.l; test\_store\_bkpt(ll+1);
       if (stack_tracing) {
              tracing = true;
              if (cur_line) show_line();
              g[rS].h, g[rS].l, k, x.h, x.l);
              else printf("_{||}||) = \frac{1}{2} - 
                                    q[rS].l, k \equiv rZ + 1? "(rG,rA)": special\_name[k], x.h, x.l);
       S +++, g[rS] = incr(g[rS], 8);
This code is used in section 102.
                       ⟨ Cases for individual MMIX instructions 84⟩ +≡
case UNSAVE: if (xx \neq 0 \lor yy \neq 0) goto illegal_inst;
       z.l \&= -8; g[rS] = incr(z, 8);
       for (k = rZ + 1; ; )  {
              \langle \text{Load } g[k] \text{ from the register stack } 105 \rangle;
              if (k \equiv rP) k = rR;
              else if (k \equiv rB) k = 255;
              else if (k \equiv G) break;
              else k--;
       S = g[rS].l \gg 3;
       stack_load();
       k = l[S \& lring\_mask].l \& #ff;
       for (j = 0; j < k; j++) stack\_load();
       O = S; \ g[rO] = g[rS]; \ L = k > G?G:k;
       g[rL].l = L; a = g[rL]; g[rG].l = G; break;
                   \langle \text{Load } g[k] \text{ from the register stack } 105 \rangle \equiv
       g[rS] = incr(g[rS], -8);
       ll = mem\_find(g[rS]);
       test\_load\_bkpt(ll); test\_load\_bkpt(ll+1);
       if (k \equiv rZ + 1) {
             x.l = G = g[rG].l = ll \rightarrow tet \gg 24, a.l = g[rA].l = (ll + 1) \rightarrow tet \& \#3ffff;
              if (G < 32) x.l = G = g[rG].l = 32;
       } else g[k].h = ll \rightarrow tet, g[k].l = (ll + 1) \rightarrow tet;
       if (stack_tracing) {
              tracing = true;
              if (cur_line) show_line();
              if (k \ge 32) printf("ululululululululuss-=8, ug[%d]=M8[#%08x%08x]=#%08x%08x\n", k,
                                    g[rS].h, g[rS].l, ll \rightarrow tet, (ll + 1) \rightarrow tet);
```

This code is used in section 104.

106. The cache maintenance instructions don't affect this simulation, because there are no caches. But if the user has invoked them, we do provide a bit of information when tracing, indicating the scope of the instruction.

```
⟨ Cases for individual MMIX instructions 84⟩ +≡
case SYNCID: case SYNCIDI: case PREST: case PRESTI: case SYNCD: case SYNCDI:
  case PREGO: case PREGOI: case PRELD: case PRELDI: x = incr(w, xx); break;
        Several loose ends remain to be nailed down.
⟨ Cases for individual MMIX instructions 84⟩ +≡
case GO: case GOI: x = inst\_ptr; inst\_ptr = w; goto store\_x;
case JMP: case JMPB: inst\_ptr = z;
case SWYM: break;
case SYNC: if (xx \neq 0 \lor yy \neq 0 \lor zz > 7) goto illegal_inst;
  if (zz < 3) break;
case LDVTS: case LDVTSI: privileged_inst: strcpy(lhs, "!privileged");
  goto break_inst;
illegal_inst: strcpy(lhs, "!illegal");
break\_inst: breakpoint = tracing = true;
  if (\neg interacting \land \neg interact\_after\_break) halted = true;
  break:
```

```
§62.
                                                                          stack_load: void (), §83.
a: octa, §61.
breakpoint: bool, §61.
                                     lring\_mask: int, §76.
                                                                          stack_store: void (), §82.
cur_line: int, §31.
                                    mem\_find: mem\_tetra *(),
                                                                          stack_tracing: bool, §61.
                                                                          store_x: label, §84.
G: register int, §75.
                                       §20.
                                    O: register int, §75.
                                                                          strcpy: char *(), <string.h>.
g: octa [], §76.
                                                                          SWYM = #fd, §54.
GO = {}^{\#}9e, \S 54.
                                    PREGO = {}^{\#}9c, \S 54.
GOI = #9f, §54.
                                    PREGOI = #9d, §54.
                                                                         SYNC = \#fc, \S 54.
                                    PRELD = #9a, §54.
                                                                         SYNCD = {}^{\#}b8, \S 54.
h: tetra, §10.
                                    PRELDI = #9b, §54.
                                                                         SYNCDI = \# b9, §54.
halted: bool, §61.
                                    PREST = \#ba, \S 54.
                                                                         SYNCID = \#bc, \S 54.
incr: octa (), MMIX-ARITH §6.
inst\_ptr: \mathbf{octa}, \S 61.
                                    PRESTI = \# bb, \S54.
                                                                         SYNCIDI = #bd, §54.
interact_after_break: bool, §61.
                                    printf: int (), <stdio.h>.
                                                                         test\_load\_bkpt = macro(), \S 83.
interacting: bool, §61.
                                    rA = 21, \S 55.
                                                                          test\_store\_bkpt = macro(), \S82.
j: register int, §62.
                                    rB = 0, \S 55.
                                                                          tet: tetra, §16.
JMP = {}^{\#}f0, \S 54.
                                    rG = 19, \S 55.
                                                                          tracing: bool, §61.
JMPB = #f1, §54.
                                    rL = 20, \S 55.
                                                                          true = 1, \S 9.
k: register int, §62.
                                                                         \mathtt{UNSAVE} = {}^{\#}\mathtt{fb}, \, \S 54.
                                    rO = 10, \S 55.
l: tetra, §10.
                                    rP = 23, \S 55.
                                                                         w: octa, §61.
l: octa *, §76.
                                    rR = 6, \S 55.
                                                                         x: octa, §61.
                                    rS = 11, \S 55.
L: register int, §75.
                                                                         xx: register int, §62.
                                    rZ = 27, \S 55.
LDVTS = #98, §54.
                                                                         yy: register int, \S62.
LDVTSI = #99, §54.
                                    S: int, §76.
                                                                         z: octa, §61.
lhs: char [], §139.
                                    show_line: void (), §47.
                                                                         zz: register int, §62.
ll: register mem_tetra *,
                                    special\_name: char *[], §56.
```

108. Trips and traps. We have now implemented 253 of the 256 instructions: all but TRIP, TRAP, and RESUME.

The TRIP instruction simply turns H\_BIT on in the exc variable; this will trigger an interruption to location 0.

The TRAP instruction is not simulated, except for the system calls mentioned in the introduction.

```
⟨ Cases for individual MMIX instructions 84⟩ +≡
case TRIP: exc = H_BIT; break;
case TRAP: if (xx \neq 0 \lor yy > max\_sys\_call) goto privileged_inst;
  strcpy(rhs, trap\_format[yy]);
  g[rWW] = inst\_ptr;
  g[rXX].h = sign\_bit, g[rXX].l = inst;
  g[rYY] = y, g[rZZ] = z;
  z.h = 0, z.l = zz;
  a = incr(b, 8);
  \langle Prepare memory arguments ma = M[a] and mb = M[b] if needed 111 \rangle;
  switch (yy) {
  case Halt: (Either halt or print warning 109); g[rBB] = g[255]; break;
  case Fopen: g[rBB] = mmix\_fopen((unsigned char) zz, mb, ma); break;
  case Fclose: g[rBB] = mmix\_fclose((unsigned char) zz); break;
  case Fread: q[rBB] = mmix\_fread((unsigned char) zz, mb, ma); break;
  case Fqets: q[rBB] = mmix\_fqets((unsigned char) zz, mb, ma); break;
  case Fgetws: g[rBB] = mmix\_fgetws((unsigned char) zz, mb, ma); break;
  case Fwrite: q[rBB] = mmix\_fwrite((unsigned char) zz, mb, ma); break;
  case Fputs: q[rBB] = mmix\_fputs((unsigned char) zz, b); break;
  case Fputws: g[rBB] = mmix\_fputws((unsigned char) zz, b); break;
  case Fseek: q[rBB] = mmix\_fseek((unsigned char) zz, b); break;
  case Ftell: g[rBB] = mmix\_ftell((unsigned char) zz); break;
  }
  x = q[255] = q[rBB]; break;
109. (Either halt or print warning 109) \equiv
  if (\neg zz) halted = breakpoint = true;
  else if (zz \equiv 1) {
     if (loc.h \lor loc.l \ge #90) goto privileged\_inst;
     print\_trip\_warning(loc.l \gg 4, incr(g[rW], -4));
  } else goto privileged_inst;
This code is used in section 108.
110.
        \langle \text{Global variables } 19 \rangle + \equiv
  char arg\_count[] = \{1, 3, 1, 3, 3, 3, 3, 2, 2, 2, 1\};
  char * trap\_format[] = {"Halt(%z)"},
        "$255_{\square}=_{\square}Fopen(\%!z,M8[\%#b]=\%#q,M8[\%#a]=\%p)_{\square}=_{\square}\%x",
        "$255_=_Fclose(%!z)_=_\%x", "$255_=_Fread(%!z,M8[%#b]=%#q,M8[%#a]=%p)_=_\%x",
        "$255_{\square}=_{\square}Fgets(\%!z,M8[\%#b]=\%#q,M8[\%#a]=\%p)_{\square}=_{\square}\%x",
        \$255_{} = \ Fgetws(\%!z,M8[\%#b] = \%#q,M8[\%#a] = \%p)_{} = \ \%x'',
        "$255||=||Fwrite(%!z,M8[%#b]=%#q,M8[%#a]=%p)||=||%x",
        "$255_{\square}=_{\square}Fputs(%!z,%#b)_{\square}=_{\square}%x", "$255_{\square}=_{\square}Fputws(%!z,%#b)_{\square}=_{\square}%x",
        "$255__=_Fseek(%!z,%b)__=_\%x", "$255__=_Ftell(\%!z)__=_\%x"};
```

This code is used in section 108.

```
111. \langle Prepare memory arguments ma=M[a] and mb=M[b] if needed 111\rangle \equiv if (arg\_count[yy] \equiv 3) { ll = mem\_find(b); test\_load\_bkpt(ll); test\_load\_bkpt(ll+1); mb.h = ll-tet, mb.l = (ll+1)-tet; <math>ll = mem\_find(a); test\_load\_bkpt(ll); test\_load\_bkpt(ll+1); ma.h = ll-tet, ma.l = (ll+1)-tet; }
```

112. The input/output operations invoked by TRAPs are done by subroutines in an auxiliary program module called MMIX-IO. Here we need only declare those subroutines, and write three primitive interfaces on which they depend.

```
extern void mmix_io_init ARGS((void));
extern octa mmix_fopen ARGS((unsigned char, octa, octa));
extern octa mmix_fclose ARGS((unsigned char));
extern octa mmix_fread ARGS((unsigned char, octa, octa));
extern octa mmix_fgets ARGS((unsigned char, octa, octa));
extern octa mmix_fgetw ARGS((unsigned char, octa, octa));
extern octa mmix_fwrite ARGS((unsigned char, octa, octa));
extern octa mmix_fputs ARGS((unsigned char, octa));
extern octa mmix_fputs ARGS((unsigned char, octa));
extern octa mmix_fseek ARGS((unsigned char, octa));
extern octa mmix_fseek ARGS((unsigned char, octa));
extern octa mmix_ftell ARGS((unsigned char));
extern void print_trip_warning ARGS((int, octa));
extern void mmix_fake_stdin ARGS((FILE *));
```

```
a: octa, §61.
                                  loc: octa, §61.
                                                                       MMIX-IO §18.
                                  ma: octa, §61.
ARGS = macro(), §11.
                                                                     mmix_io_init: void (),
b: octa, §61.
                                  max\_sys\_call = macro, \S 59.
                                                                       MMIX-IO §7.
breakpoint: bool, §61.
                                  mb: octa, §61.
                                                                     octa = struct, \S 10.
exc: int, §61.
                                  mem\_find: mem\_tetra *(),
                                                                     print_trip_warning: void (),
Fclose = 2, \S 59.
                                    §20.
                                                                       MMIX-IO §23.
Fqets = 4, \S 59.
                                  mmix_fake_stdin: void (),
                                                                     privileged_inst: label, §107.
                                                                     rBB = 7, \S 55.
Fgetws = 5, \S 59.
                                    MMIX-IO §10.
FILE, <stdio.h>.
                                  mmix_fclose: octa (),
                                                                     rhs = macro, \S 139.
                                                                     rW = 24, \S 55.
Fopen = 1, \S 59.
                                    MMIX-IO §11.
Fputs = 7, \S 59.
                                  mmix_fgets: octa (),
                                                                     rWW = 28, \S 55.
                                                                     rXX = 29, \S 55.
Fputws = 8, \S 59.
                                    MMIX-10 §14.
Fread = 3, \S 59.
                                  mmix_fgetws: octa (),
                                                                     rYY = 30, \S 55.
Fseek = 9, \S 59.
                                    MMIX-IO §16.
                                                                     rZZ = 31, \S 55.
Ftell = 10, \S 59.
                                  mmix_fopen: octa (),
                                                                     sign\_bit = macro, \S 15.
                                                                     strcpy: char *(), <string.h>.
Fwrite = 6, §59.
                                    MMIX-10 §8.
                                                                     test\_load\_bkpt = macro(), \S 83.
g: octa [], §76.
                                  mmix_fputs: octa (),
h: tetra, §10.
                                    MMIX-IO §19.
                                                                     tet: tetra, §16.
H_BIT = macro, \S 57.
                                  mmix_fputws: octa (),
                                                                     TRAP = ^{\#}00, §54.
                                                                     TRIP = ^{\#}ff, §54.
Halt = 0, \S 59.
                                    MMIX-IO §20.
halted: bool, §61.
                                  mmix_fread: octa (),
                                                                     true = 1, \S 9.
incr: octa (), MMIX-ARITH §6.
                                    MMIX-IO §12.
                                                                    x: octa, §61.
inst: tetra, §61.
                                  mmix_fseek: octa (),
                                                                    xx: register int, §62.
inst\_ptr: octa, §61.
                                    MMIX-IO §21.
                                                                   y: octa, §61.
                                                                   yy: register int, §62.
l: tetra, §10.
                                  mmix_ftell: octa (),
ll: register mem_tetra *,
                                   MMIX-10 §22.
                                                                   z: octa, §61.
  \S 62.
                                  mmix_fwrite: octa (),
                                                                    zz: register int, §62.
```

114. The subroutine mmgetchars(buf, size, addr, stop) reads characters starting at address addr in the simulated memory and stores them in buf, continuing until size characters have been read or some other stopping criterion has been met. If stop < 0 there is no other criterion; if stop = 0 a null character will also terminate the process; otherwise addr is even, and two consecutive null bytes starting at an even address will terminate the process. The number of bytes read and stored, exclusive of terminating nulls, is returned.

```
\langle Subroutines 12\rangle + \equiv
  int mmgetchars ARGS((char *, int, octa, int));
  int mmgetchars (buf, size, addr, stop)
        char *buf:
        int size;
        octa addr:
        int stop:
     register char *p:
     register int m;
     register mem_tetra *ll;
     register tetra x;
     octa a:
     for (p = buf, m = 0, a = addr; m < size;) {
        ll = mem\_find(a); test\_load\_bkpt(ll);
        x = ll \rightarrow tet;
        if ((a.l \& ^{\#}3) \lor m > size - 4) (Read and store one byte; return if done 115)
        else (Read and store up to four bytes; return if done 116)
     }
     return size;
  }
115.
        \langle \text{Read and store one byte; return if done } 115 \rangle \equiv
     *p = (x \gg (8 * ((\sim a.l) \& #3))) \& #ff;
     if (\neg *p \land stop > 0) {
        if (stop \equiv 0) return m;
        if ((a.l \& #1) \land *(p-1) \equiv '\0') return m-1;
     p++, m++, a = incr(a, 1);
This code is used in section 114.
116.
       \langle \text{Read and store up to four bytes; return if done } 116 \rangle \equiv
     *p = x \gg 24;
     if (\neg *p \land (stop \equiv 0 \lor (stop > 0 \land x < \#10000))) return m;
     *(p+1) = (x \gg 16) \& #ff;
     if (\neg *(p+1) \land stop \equiv 0) return m+1;
     *(p+2) = (x \gg 8) \& #ff;
     if (\neg *(p+2) \land (stop \equiv 0 \lor (stop > 0 \land (x \& \#ffff) \equiv 0))) return m+2;
     *(p+3) = x \& #ff;
```

```
if (\neg *(p+3) \land stop \equiv 0) return m+3;
     p += 4, m += 4, a = incr(a, 4);
This code is used in section 114.
        The subroutine mmputchars(buf, size, addr) puts size characters into the sim-
ulated memory starting at address addr.
\langle Subroutines 12\rangle + \equiv
  void mmputchars ARGS((unsigned char *, int, octa));
  void mmputchars (buf, size, addr)
        unsigned char *buf;
        int size;
        octa addr;
     register unsigned char *p:
     register int m;
     register mem_tetra *ll;
     octa a;
     for (p = buf, m = 0, a = addr; m < size;) {
        ll = mem\_find(a); test\_store\_bkpt(ll);
        if ((a.l \& #3) \lor m > size - 4) \land Load and write one byte 118)
        else (Load and write four bytes 119);
     }
  }
118. \langle Load and write one byte 118\rangle \equiv
     register int s = 8 * ((\sim a.l) \& #3);
     ll \rightarrow tet \oplus = (((ll \rightarrow tet \gg s) \oplus *p) \& \#ff) \ll s;
     p++, m++, a = incr(a, 1);
This code is used in section 117.
119. \langle Load and write four bytes 119\rangle \equiv
     ll \rightarrow tet = (*p \ll 24) + (*(p+1) \ll 16) + (*(p+2) \ll 8) + *(p+3);
     p += 4, m += 4, a = incr(a, 4);
```

This code is used in section 117.

120. When standard input is being read by the simulated program at the same time as it is being used for interaction, we try to keep the two uses separate by maintaining a private buffer for the simulated program's StdIn. Online input is usually transmitted from the keyboard to a C program a line at a time; therefore an fgets operation works much better than fread when we prompt for new input. But there is a slight complication, because fgets might read a null character before coming to a newline character. We cannot deduce the number of characters read by fgets simply by looking at strlen(stdin\_buf).

```
\langle Subroutines 12\rangle + \equiv
  char stdin_chr ARGS((void));
  char stdin_chr()
     register char *p;
     while (stdin\_buf\_start \equiv stdin\_buf\_end) {
        if (interacting) {
           printf("StdIn>_{\sqcup}"); fflush(stdout);
        if (\neg fgets(stdin\_buf, 256, stdin))
           panic("End_of_file_on_standard_input;_use_the_-f_option,_not_<");
        stdin\_buf\_start = stdin\_buf;
        for (p = stdin\_buf; p < stdin\_buf + 254; p++)
           if (*p \equiv '\n') break;
        stdin\_buf\_end = p + 1;
     return *stdin_buf_start ++;
        \langle Global variables 19\rangle + \equiv
                               /* standard input to the simulated program */
  char stdin_buf [256];
  char *stdin_buf_start:
                                /* current position in that buffer */
                               /* current end of that buffer */
  char *stdin_buf_end;
        Just after executing each instruction, we do the following. Underflow that is
exact and not enabled is ignored. (This applies also to underflow that was triggered
by RESUME_SET.)
\langle Check for trip interrupt 122\rangle \equiv
  if ((exc \& (U_BIT + X_BIT)) \equiv U_BIT \land \neg (g[rA].l \& U_BIT)) exc \&= \sim U_BIT;
  if (exc) {
      if \ (\mathit{exc} \ \& \ \mathit{tracing\_exceptions}) \ \ \mathit{tracing} = \mathit{true}; 
     j = exc \& (g[rA].l \mid H_BIT);
                                        /* find all exceptions that have been enabled */
     if (i) \langle Initiate a trip interrupt 123 \rangle;
     g[rA].l \mid = exc \gg 8;
  }
This code is used in section 60.
123. (Initiate a trip interrupt 123) \equiv
     tripping = true;
```

for  $(k = 0; \neg (j \& H_BIT); j \ll 1, k++)$ ;

```
exc \&= \sim (H_BIT \gg k);
                                    /* trips taken are not logged as events */
     g[rW] = inst\_ptr;
     inst\_ptr.h = 0, inst\_ptr.l = k \ll 4;
     q[rX].h = sign\_bit, q[rX].l = inst;
     if ((op \& \#e0) \equiv STB) \ g[rY] = w, g[rZ] = b;
     else q[rY] = y, q[rZ] = z;
     q[rB] = q[255];
     g[255] = g[rJ];
     if (op \equiv TRIP) w = g[rW], x = g[rX], a = g[255];
This code is used in section 122.
        We are finally ready for the last case.
\langle Cases for individual MMIX instructions 84\rangle +\equiv
case RESUME: if (xx \lor yy \lor zz) goto illegal\_inst;
  inst\_ptr = z = g[rW];
  b = q[rX];
  if (\neg(b.h \& sign\_bit)) \langle Prepare to perform a ropcode 125\rangle;
  break:
```

```
stdout \colon \mathbf{FILE} \ *, \ \mathsf{<stdio.h>}.
a: octa, §117.
                                     op: register mmix_opcode,
                                                                           strlen: size_t (), <string.h>.
ARGS = macro(), \S 11.
                                       §62.
b: octa, §61.
                                     panic = macro(), \S 14.
                                                                           tracing: bool, §61.
exc: int, §61.
                                     printf: int (), <stdio.h>.
                                                                           tracing_exceptions: int, §61.
fflush: int (), <stdio.h>.
                                     rA = 21, \S 55.
                                                                          TRIP = ^{\#}ff, §54.
fgets: \mathbf{char} *(), < \mathbf{stdio.h} >.
                                     rB = 0, \S 55.
                                                                           tripping: bool, §61.
fread: size_t (), <stdio.h>.
                                     RESUME = ^{\#}f9, §54.
                                                                           true = 1, \S 9.
g: octa [], §76.
                                     RESUME_SET = 2, §125.
                                                                          U_BIT = macro, \S 57.
h: tetra, §10.
                                     rJ = 4, \S 55.
                                                                          w: octa, §61.
                                     rW = 24, \S 55.
H_BIT = macro, \S 57.
                                                                          x: register tetra, §114.
illegal_inst: label, §107.
                                     rX = 25, \S 55.
                                                                          X_BIT = macro, \S 57.
inst: tetra, §61.
                                     rY = 26, \S 55.
                                                                          xx: register int, §62.
inst\_ptr: octa, §61.
                                     rZ = 27, \S 55.
                                                                          y: octa, §61.
interacting: bool, §61.
                                     sign\_bit = macro, \S 15.
                                                                          yy: register int, §62.
                                     STB = \#a0, \S 54.
j: register int, §62.
                                                                          z: octa, §61.
k: register int, §62.
                                     stdin: FILE *, <stdio.h>.
                                                                          zz: register int, §62.
l: tetra, §10.
```

125. Here we check to see if the ropcode restrictions hold. If so, the ropcode will actually be obeyed on the next fetch phase.

```
#define RESUME_AGAIN 0
                                  /* repeat the command in rX as if in location rW -4 */
                                 /* same, but substitute rY and rZ for operands */
#define RESUME_CONT 1
#define RESUME_SET 2
                                /* set register $X to rZ */
\langle Prepare to perform a ropcode 125\rangle \equiv
                           /* the ropcode is the leading byte of rX */
     rop = b.h \gg 24;
     switch (rop) {
     case RESUME_CONT: if ((1 \ll (b.l \gg 28)) \& \text{\#8f30}) goto illegal_inst;
     case RESUME_SET: k = (b.l \gg 16) \& \text{#ff};
        if (k > L \land k < G) goto illegal_inst;
     case RESUME_AGAIN: if ((b.l \gg 24) \equiv \text{RESUME}) goto illegal_inst;
        break:
     default: goto illeqal_inst;
     resuming = true;
This code is used in section 124.
126. (Install special operands when resuming an interrupted operation 126) \equiv
  if (rop \equiv RESUME\_SET) {
     op = \mathtt{ORI}:
     y = q[rZ];
     z = zero\_octa;
     exc = q[rX].h \& #ff00;
     f = X_is_dest_bit;
  } else { /* RESUME_CONT */
     y = g[rY];
     z = g[rZ];
This code is used in section 71.
127. We don't want to count the UNSAVE that bootstraps the whole process.
\langle \text{Update the clocks } 127 \rangle \equiv
  if (sclock.l \lor sclock.h \lor \neg resuming) {
     sclock.h += info[op].mems; /* clock goes up by 2^{32} for each \mu */
     sclock = incr(sclock, info[op].oops);
                                                 /* clock goes up by 1 for each v */
     if ((\neg(loc.h \& sign\_bit) \lor (g[rU].h \& #8000)) \land
             ((op \& (q[rU].h \gg 16)) \equiv (q[rU].h \gg 24))) {
        g[rU].l++;
        if (g[rU].l \equiv 0) { g[rU].h \leftrightarrow +; if (g[rU].h \& #7fff \equiv 0) g[rU].h -= #8000; }
           /* usage counter counts matched instructions simulated */
     if (g[rI].l \leq info[op].oops \land g[rI].l \land g[rI].h \equiv 0) tracing = breakpoint = true;
     g[rI] = incr(g[rI], -info[op].oops); /* interval v timer counts down */
```

This code is used in section 60.

128. Tracing. After an instruction has been executed, we often want to display its effect. This part of the program prints out a symbolic interpretation of what has just happened.

```
\langle Trace the current instruction, if requested 128\rangle \equiv
  if (tracing) {
     if (showing\_source \land cur\_line) show\_line();
     ⟨ Print the frequency count, the location, and the instruction 130⟩;
     ⟨ Print a stream-of-consciousness description of the instruction 131⟩;
     if (showing\_stats \lor breakpoint) show\_stats(breakpoint);
     just\_traced = true;
  } else if (just_traced) {
     printf ("□....\n");
    just\_traced = false;
     shown\_line = -qap - 1; /* gap will not be filled */
This code is used in section 60.
129. \langle Global variables 19\rangle + \equiv
                         /* should traced instructions also show the statistics? */
  bool showing_stats:
  bool just_traced;
                        /* was the previous instruction traced? */
```

```
b: octa, §61.
                                                                       rY = 26, §55.
                                   L: register int, §75.
bool = enum, \S 9.
                                   l: tetra, §10.
                                                                       rZ = 27, \S 55.
breakpoint: bool, §61.
                                   loc: octa, §61.
                                                                       sclock: octa, §19.
cur_line: int, §31.
                                   mems: unsigned char, §64.
                                                                       show_line: void (), §47.
exc: int, §61.
                                   oops: unsigned char, §64.
                                                                       show_stats: void (), §140.
f: register tetra, §62.
                                   op: register mmix_opcode,
                                                                       showing_source: bool, §48.
false = 0, \S 9.
                                     §62.
                                                                       shown_line: int, §48.
                                   ORI = {}^{\#}c1, \S 54.
G: register int, §75.
                                                                       sign\_bit = macro, §15.
                                                                       tracing: bool, §61.
                                   printf: int (), <stdio.h>.
g: octa [], §76.
gap: int, §48.
                                   RESUME = \# f9, §54.
                                                                       true = 1, \S 9.
h: tetra, §10.
                                   resuming: bool, §61.
                                                                       X_{is}_{dest_{bit}} = #20, §65.
illegal_inst: label, §107.
                                   rI = 12, \S 55.
                                                                       y: octa, §61.
incr: octa (), MMIX-ARITH §6.
                                   rop: int, §61.
                                                                       z: octa, §61.
info: op_info [], §65.
                                   rU = 17, \S 55.
                                                                       zero_octa: octa,
k: register int, §62.
                                   rX = 25, \S 55.
                                                                         MMIX-ARITH §4.
```

This code is used in section 128.

131. This part of the simulator was inspired by ideas of E. H. Satterthwaite, Software—Practice and Experience 2 (1972), 197–217. Online debugging tools have improved significantly since Satterthwaite published his work, but good offline tools are still valuable; alas, today's algebraic programming languages do not provide tracing facilities that come anywhere close to the level of quality that Satterthwaite was able to demonstrate for ALGOL in 1970.

```
 \begin{split} &\langle \operatorname{Print} \text{ a stream-of-consciousness description of the instruction } 131 \rangle \equiv \\ & \text{ if } (lhs[0] \equiv '\,!\,') \  \  \, printf (\,\text{"} \,\text{'} \,\text{s} \,\text{\_instruction!} \,\text{`} \,\text{"} \,\text{"}
```

This code is used in section 128.

132. Push, pop, and UNSAVE instructions display changes to rL and rO explicitly; otherwise the change is implicit, if  $L \neq old\_L$ .

This code is used in section 131.

ADDUI = #23, §54. exc: int, §61. f: register tetra, §62. false = 0, §9. freq: tetra, §16. g: octa [], §76. h: tetra, §10. info: op\_info [], §65. inst: tetra, §61. inst\_ptr: octa, §61. l: tetra, §10. L: register int, §75.

 $\begin{array}{l} ll: \ \mathbf{register} \ \mathbf{mem\_tetra} \ *, \\ \S 62. \\ loc: \ \mathbf{octa}, \ \S 61. \\ mem\_find: \ \mathbf{mem\_tetra} \ *(\ ), \\ \S 20. \\ name: \ \mathbf{char} \ *, \ \S 64. \\ old\_L: \ \mathbf{int}, \ \S 61. \\ op: \ \mathbf{register} \ \mathbf{mmix\_opcode}, \\ \S 62. \\ \mathsf{ORI} = \# \mathtt{c1}, \ \S 54. \\ p: \ \mathbf{register} \ \mathbf{char} \ *, \ \S 62. \\ \end{array}$ 

lhs: **char** [], §139.

133. Each MMIX instruction has a *trace format* string, which defines its symbolic representation. For example, the string for ADD is "1 = 1", 1 = 1", 1 = 1", if the instruction is, say, ADD \$1,\$2,\$3 with \$2 = 5 and \$3 = 8, and if the stack offset is 100, the trace output will be "\$1=1[101]\_1=151+181=113".

Percent signs (%) induce special format conventions, as follows:

- %a, %b, %p, %q, %w, %x, %y, and %z stand for the numeric contents of octabytes a, b, ma, mb, w, x, y, and z, respectively; a "style" character may follow the percent sign in this case, as explained below.
- %( and %) are brackets that indicate the mode of floating point rounding. If  $round\_mode = ROUND\_NEAR$ ,  $ROUND\_OFF$ ,  $ROUND\_UP$ ,  $ROUND\_DOWN$ , the corresponding brackets are ( and ), [ and ], ^ and ^, \_ and \_. Such brackets are placed around a floating point operator; for example, floating point addition is denoted by '[+]' when the current rounding mode is rounding-off.
- %1 stands for the string lhs, which usually represents the "left hand side" of the instruction just performed, formatted as a register number and its equivalent in the ring of local registers (e.g., '\$1=1[101]') or as a register number and its equivalent in the array of global registers (e.g., '\$255=g[255]'). The POP instruction uses lhs to indicate how the "hole" in the register stack was plugged.
- %r means to switch to string rhs and continue formatting from there. This mechanism allows us to use variable formats for opcodes like TRAP that have several variants.
- %t means to print either 'Yes, ->loc' (where loc is the location of the next instruction) or 'No', depending on the value of x.
- %g means to print ' (bad guess)' if good is false.
- %s stands for the name of special register g[zz].
- %? stands for omission of the following operator if z = 0. For example, the memory address of LDBI is described by '%#y%?+'; this means to treat the address as simply '%#y' if z = 0, otherwise as '%#y+%z'. This case is used only when z is a relatively small number (z.h = 0).

```
 \langle \text{Interpret character } *p \text{ in the trace format } 133 \rangle \equiv \\ \{ & \text{if } (*p \neq `\%') \ fputc(*p, stdout); \\ & \text{else } \{ \\ & style = decimal; \\ & char\_switch: \\ & \text{switch } (*++p) \ \{ \\ & \langle \text{Cases for formatting characters } 134 \rangle; \\ & \text{default: } printf("BUG!!"); \ /* \ can't \ happen */ \\ & \} \\ & \} \\ \}
```

This code is used in section 131.

134. Octabytes are printed as decimal numbers unless a "style" character intervenes between the percent sign and the name of the octabyte: '#' denotes hexadecimal notation, prefixed by #; '0' denotes hexadecimal notation with no prefixed # and with leading zeros not suppressed; '.' denotes floating decimal notation; and '!' means to use the names StdIn, StdOut, or StdErr if the value is 0, 1, or 2.

```
\langle Cases for formatting characters 134\rangle \equiv
case '#': style = hex; goto char_switch;
case '0': style = zhex; goto char_switch;
case '.': style = floating; goto char\_switch;
case '!': style = handle; goto char\_switch;
See also sections 136 and 138.
This code is used in section 133.
135. \langle \text{Type declarations } 9 \rangle + \equiv
  typedef enum {
     decimal, hex, zhex, floating, handle
  } fmt_style;
136. (Cases for formatting characters 134) +\equiv
case 'a': trace\_print(a); break:
case 'b': trace_print(b); break;
case 'p': trace_print(ma); break;
case 'q': trace_print(mb); break;
case 'w': trace\_print(w); break;
case 'x': trace\_print(x); break;
case 'v': trace_print(y); break;
case 'z': trace\_print(z); break;
```

```
mb: octa, §61.
a: octa, §61.
                                                                      stdout: FILE *, <stdio.h>.
b: octa, §61.
                                   p: register char *, §62.
                                                                      style: fmt_style, §137.
false = 0, \S 9.
                                                                      trace_print: void (), §137.
                                   printf: int (), <stdio.h>.
fputc: int (), <stdio.h>.
                                   rhs = macro, \S 139.
                                                                      w: octa, §61.
g: octa [], §76.
                                  ROUND_DOWN = 3, \S 100.
                                                                     x: octa, §61.
good: bool, §61.
                                   round_mode: int, §61.
                                                                     y: octa, §61.
                                  ROUND_NEAR = 4, §100.
h: tetra, §10.
                                                                     z: octa, §61.
lhs: char [], §139.
                                  ROUND_OFF = 1, \S 100.
                                                                     zz: register int, §62.
ma: octa, §61.
                                  ROUND_UP = 2, \S 100.
```

```
137. \langle Subroutines 12\rangle + \equiv
  fmt_style style;
  char *stream_name[] = {"StdIn", "StdOut", "StdErr"};
  void trace_print ARGS((octa));
  void trace_print(o)
       octa o;
     switch (style) {
     case decimal: print_int(o); return;
     case hex: fputc('#', stdout); print_hex(o); return;
     case zhex: printf("%08x%08x", o.h, o.l); return;
     case floating: print_float(o); return;
     case handle: if (o.h \equiv 0 \land o.l < 3) printf (stream\_name[o.l]);
       else print_int(o); return;
  }
138. (Cases for formatting characters 134) +\equiv
case '(': fputc(left_paren[round_mode], stdout); break;
case ')': fputc(right_paren[round_mode], stdout); break;
case 't': if (x.l) printf("_{\square}Yes,_{\square}->_{\square}#"), print_hex(inst_ptr);
  else printf("⊔No"); break;
case 'g': if (\neg good) printf("(bad_{\square}guess)"); break;
case 's': printf(special_name[zz]); break;
case '?': p++; if (z.l) printf("%c%d", *p, z.l); break;
case '1': printf(lhs); break;
case 'r': p = switchable\_string; break;
139.
        #define rhs &switchable_string[1]
\langle \text{Global variables } 19 \rangle + \equiv
  char left_paren[] = {0, '[', '^', '_', ', ', '; };
                                                 /* denotes the rounding mode */
  char right\_paren[] = \{0, ']', ', ', ', ', ', '\};
                                                   /* denotes the rounding mode */
  char switchable_string[48]; /* holds rhs; position 0 is ignored */
    /* switchable_string must be able to hold any trap_format */
  char lhs[32];
  int qood_quesses, bad_quesses; /* branch prediction statistics */
```

```
ARGS = macro(), §11.
                                    hex = 1, §135.
                                                                        round\_mode: int, §61.
bool = enum, \S 9.
                                    incr: octa (), MMIX-ARITH §6.
                                                                        rU = 17, \S 55.
decimal=0, \, \S 135.
                                    inst_ptr: octa, §61.
                                                                        sclock: octa, §19.
                                                                        special\_name: char *[], §56.
floating = 3, \S 135.
                                    l: tetra, §10.
fmt_style = enum, §135.
                                    octa = struct, \S 10.
                                                                        stdout: FILE *, <stdio.h>.
fputc: int (), <stdio.h>.
                                    p: register char *, §62.
                                                                        trap\_format \colon \mathbf{char} \ * [\ ], \ \S 110.
g: octa [], §76.
                                    print_float: void (),
                                                                        x: octa, §61.
good: bool, §61.
                                     MMIX-ARITH §54.
                                                                        z: octa, §61.
h: tetra, §10.
                                    print\_hex: void (), §12.
                                                                        zhex = 2, \S 135.
halted: bool, §61.
                                    print_int: void (), §15.
                                                                        zz: register int, §62.
handle = 4, \S 135.
                                    printf: int (), <stdio.h>.
```

**141.** Running the program. Now we are ready to fit the pieces together into a working simulator.

```
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include <signal.h>
#include "abstime.h"
  (Preprocessor macros 11)
  (Type declarations 9)
  (Global variables 19)
  (Subroutines 12)
 int main(argc, argv)
      int argc;
      char * arqv[];
    (Local registers 62);
    mmix_io_init();
    ⟨ Process the command line 142⟩;
    (Initialize everything 14);
    (Load the command line arguments 163);
    (Get ready to UNSAVE the initial context 164):
    while (1) {
      if (interrupt \land \neg breakpoint) breakpoint = interacting = true, interrupt = false;
      else {
         breakpoint = false;
         if (interacting) (Interact with the user 149);
       if (halted) break:
      do \langle Perform one instruction 60\rangle while ((\neg interrupt \land \neg breakpoint) \lor resuming);
      if (interact\_after\_break) interacting = true, interact\_after\_break = false;
  end_simulation: if (profiling) \langle Print all the frequency counts 53 \rangle;
    if (interacting \vee profiling \vee showing_stats) show_stats(true);
    return q[255].l;
                         /* provide rudimentary feedback for non-interactive runs */
  }
```

142. Here we process the command line options; when we finish,  $*cur\_arg$  should be the name of the object file to be loaded and simulated.

We assume that argv[0] is never null. (The author believes strongly that the wizards who decided to allow argc=0 were mistaken when they defined the C89 standard; hence he has taken no pains to avoid system crashes when people try to invoke any of his programs with a null environment. Null invocations are contrary to the intent of C's designers.)

```
#define mmo\_file\_name *cur\_arg

\langle \text{Process the command line } 142 \rangle \equiv myself = argv[0];
```

```
for (cur\_arg = argv + 1; *cur\_arg \land (*cur\_arg)[0] \equiv `-`; cur\_arg ++)  scan\_option(*cur\_arg + 1, true); if (\neg *cur\_arg) \ scan\_option("?", true); /* exit with usage note */ <math>argc -= cur\_arg - argv; /* this is the argc of the user program */ This code is used in section 141.
```

profiling: bool, §144. resuming: bool, §61. scan\_option: void (), §143. show\_stats: void (), §140. showing\_stats: bool, §129.

 $true = 1, \S 9.$ 

The -b and -c options are effective only on the command line, but they are harmless while interacting.

```
\langle Subroutines 12\rangle + \equiv
  void scan_option ARGS((char *, bool));
  void scan_option(arg, usage)
                       /* command line argument (without the '-') */
       char * arq;
       bool usage:
                       /* should we exit with usage note if unrecognized? */
    register int k;
    switch (*arq) {
    case 't': if (strlen(arg) > 10) trace_threshold = #ffffffff;
       else if (sscanf(arg + 1, "%d", \&trace\_threshold) \neq 1) trace\_threshold = 0;
       return:
    case 'e': if (\neg *(arq + 1)) tracing_exceptions = #ff;
       else if (sscanf(arg + 1, "%x", \&tracing\_exceptions) \neq 1) tracing\_exceptions = 0;
    case 'r': stack\_tracing = true; return;
    case 's': showing_stats = true; return;
    case '1': if (\neg *(arg + 1)) gap = 3;
       else if (sscanf(arq + 1, "%d", \&gap) \neq 1) gap = 0;
       showing\_source = true; return;
    case 'L': if (\neg *(arq + 1)) profile_qap = 3;
       else if (sscanf(arq + 1, "%d", \&profile\_qap) \neq 1) profile\_qap = 0;
       profile\_showing\_source = true;
    case 'P': profiling = true; return;
    case 'v': trace_threshold = #fffffffff; tracing_exceptions = #ff;
       stack\_tracing = true; showing\_stats = true;
       qap = 10, showing\_source = true;
       profile\_qap = 10, profile\_showing\_source = true, profiling = true;
       return:
    case 'q': trace\_threshold = tracing\_exceptions = 0;
       stack\_tracing = showing\_stats = showing\_source = false;
       profiling = profile\_showing\_source = false;
       return:
    case 'i': interacting = true; return;
    case 'I': interact\_after\_break = true; return;
    case 'b': if (sscanf(arg + 1, "%d", \&buf\_size) \neq 1) buf\_size = 0; return;
    case 'c': if (sscanf(arg + 1, "%d", \&lring\_size) \neq 1) lring\_size = 0; return;
    case 'f': (Open a file for simulated standard input 145); return;
    case 'D': (Open a file for dumping binary output 146); return;
    default: if (usage) {
          fprintf(stderr, "Usage: | %s| < options > | progfile | command | line-args... \n",
          for (k = 0; usage\_help[k][0]; k++) fprintf (stderr, usage\_help[k]);
          exit(-1);
```

```
} else for (k=0;\ usage\_help[k][1] \neq \mbox{`b'};\ k++)\ printf(usage\_help[k]); return; }
```

```
ARGS = macro (), §11.

bool = enum, §9.

buf_size: int, §40.

exit: void (), <stdlib.h>.

false = 0, §9.

fprintf: int (), <stdio.h>.

gap: int, §48.

interact_after_break: bool, §61.

interacting: bool, §61.
```

lring\_size: int, §76.
myself: char \*, §144.
printf: int (), <stdio.h>.
profile\_gap: int, §48.
profile\_showing\_source: bool,
§48.
profiling: bool, §144.
showing\_source: bool, §48.
showing\_stats: bool, §129.

sscanf: int (), <stdio.h>. stack\_tracing: bool, §61. stderr: FILE \*, <stdio.h>. strlen: size\_t (), <string.h>. trace\_threshold: tetra, §61. tracing\_exceptions: int, §61. true = 1, §9. usage\_help: char \*[], §144.

```
144. \langle Global variables 19\rangle + \equiv
  char * myself;
                    /* argv[0], the name of this simulator */
                     /* pointer to current place in the argument vector */
  char **cur_ara:
                     /* has the user interrupted the simulation recently? */
  bool interrupt:
                    /* should we print the profile at the end? */
  bool profiling:
                        /* file substituted for the simulated StdIn */
  FILE *fake_stdin:
  FILE *dump\_file:
                        /* file used for binary dumps */
  char *usage\_help[] = {
  "|with|these|options:|(<n>=decimal|number,|<x>=hex|number)\n",
       "-t<n>itrace each instruction the first nitimes\n".
       "-e<x>_{||}trace||each||instruction||with||an||exception||matching||xn",
       "-r_{\parallel \parallel \parallel \parallel}trace hidden details of the register stack \n".
       "-l<n>||list||source||lines||when||tracing,||filling||gaps||<=||n|n||,
       "-sumushow statistics after each traced instruction\n".
       "-P___print_aprofile_when_simulation_ends\n",
       "-L<n>ilistisource lines with the profile n",
       "-q_{\cup\cup\cup\cup\cup} be \cup quiet: \cup show \cup only \cup the \cup simulated \cup standard \cup output \setminus n",
       "-i, | | | | run, | interactively, | (prompt, | for, | online, | commands) \n",
       "-I_{\cup\cup\cup\cup\cup} interact, \cup but \cup only \cup after \cup the \cup program \cup halts \setminus n",
       "-b<n>| change| the | buffer| size| for | source| lines\n",
       "-c<n>uchangeutheucycliculocaluregisteruringusize\n",
       "-f<filename>_use_given_file_to_simulate_standard_input\n",
       "-D<filename>| dump||a||file||for||use||by||other||simulators\n",
       ""}:
  char * interactive\_help[] = {
  "The interactive commands are: \n",
       "<return>| | | trace| | one | | instruction \n",
       "n_{\cup\cup\cup\cup\cup\cup\cup\cup\cup}trace_one_instruction\n".
       "c____continue_until_halt_or_breakpoint\n",
       "suululuushowucurrentustatistics\n",
       "g<n><t>| set | and/or | show | global | register | in | format | t \n",
       "rA<t>\u\u\u\u\u\set\u\and/or\u\show\u\register\u\rA\u\i\n\u\frac{1}{n},
       "$<n><t>|||set||and/or||show||dynamic||register||in||format||t\n",
       "M<x><t>_{\sqcup\sqcup\sqcup}set_{\sqcup}and/or_{\sqcup}show_{\sqcup}memory_{\sqcup}octabyte_{\sqcup}in_{\sqcup}format_{\sqcup}t\setminusn",
       ",<t>,is,,!,(decimal),or,,,(floating),or,,#,(hex),or,\",(string)\n",
       "____or_<empty>_(previous_<t>)_or_=<value>_(change_value)\n",
       "@<x>_{\cup\cup\cup\cup\cup\cup\cup}go_{\cup}to_{\cup}location_{\cup}x\n",
       "b[rwx]<x>_{\sqcup}set_{\sqcup}or_{\sqcup}reset_{\sqcup}breakpoint_{\sqcup}at_{\sqcup}location_{\sqcup}x\setminusn",
       "t<x>____trace_location_x\n",
       "u<x>|||||||untrace||location||x\n",
       "TulullullusetucurrentusegmentutouText_Segment\n",
       "P____set__current__segment__to__Pool_Segment\n",
       "S____set_current_segment_to_Stack_Segment\n",
```

```
"-<option>_change_a_tracing/listing/profile_option\n",
        "-?: " | show the tracing/listing/profile options | \n",
145. \langle \text{ Open a file for simulated standard input 145} \rangle \equiv
  if (fake_stdin) fclose(fake_stdin);
  fake\_stdin = fopen(arq + 1, "r");
  if (\neg fake\_stdin) fprintf(stderr, "Sorry, \sqcup I_{\sqcup} can't_{\sqcup} open_{\sqcup} file_{\sqcup} %s! \n", arg + 1);
  else mmix_fake_stdin(fake_stdin);
This code is used in section 143.
146. Open a file for dumping binary output 146 \equiv
   dump\_file = fopen(arq + 1, "wb");
  if (\neg dump\_file) fprintf (stderr, "Sorry, \sqcup I \sqcup can't \sqcup open \sqcup file \sqcup \%s! \n", arg + 1);
This code is used in section 143.
147. \langle Initialize everything 14\rangle + \equiv
  signal(SIGINT, catchint);
                                  /* now catchint will catch the first interrupt */
148. \langle \text{Subroutines } 12 \rangle + \equiv
  void catchint ARGS((int));
  void catchint(n)
       int n:
     interrupt = true;
     signal(SIGINT, catchint);
                                    /* now catchint will catch the next interrupt */
```

```
149. (Interact with the user 149) \equiv
  { register int repeating;
  interact: (Put a new command in command_buf 150):
     p = command\_buf;
     repeating = 0;
    switch (*p) {
     case '\n': case 'n': breakpoint = tracing = true; /* trace one inst and break */
                                              /* continue until breakpoint */
     case 'c': goto resume_simulation;
     case 'q': goto end_simulation;
     case 's': show_stats(true); goto interact;
     case '-': k = strlen(p); if (p[k-1] \equiv '\n') p[k-1] = '\0';
       scan\_option(p+1, false); goto interact;
     \langle \text{ Cases that change } cur\_disp\_mode \ 152 \rangle;
     \langle \text{ Cases that define } cur\_disp\_type 153 \rangle;
     (Cases that set and clear tracing and breakpoints 161);
     default: what\_say: k = strlen(command\_buf);
       if (k < 10 \land command\_buf[k-1] \equiv `\n') command\_buf[k-1] = `\0';
       else strcpy(command\_buf + 9, "...");
       printf("Eh? | Sorry, | | I | | don't | | understand | '%s' | | (Type | | h | | for | | help) \n",
             command\_buf);
       goto interact:
     case 'h': for (k = 0; interactive\_help[k][0]; k++) printf (interactive\_help[k]);
       goto interact;
  check\_syntax: if (*p \neq '\n')  {
       if (\neg *p)
       incomplete_str: printf("Syntax_error: Lincomplete command!\n");
       else {
          p[strlen(p) - 1] = '\0';
          printf("Syntax_lerror; _l'm_lignoring_l'%s'!\n", p);
     }
     while (repeating) (Display and/or set the value of the current octabyte 156);
     goto interact;
  resume_simulation: ;
  }
This code is used in section 141.
150. (Put a new command in command_buf 150) \equiv
  { register bool ready = false;
  incl\_read: while (incl\_file \land \neg ready)
       if (¬fgets(command_buf, command_buf_size, incl_file)) {
          fclose(incl_file);
          incl\_file = \Lambda;
       } else if (command\_buf[0] \neq '\n' \land command\_buf[0] \neq 'i' \land command\_buf[0] \neq '%')
          if (command\_buf[0] \equiv `` \Box`) printf("%s", command\_buf);
          else readu = true:
     while (\neg ready) {
       printf("mmix>,,"); fflush(stdout);
```

```
if (\neg fgets(command\_buf, command\_buf\_size, stdin)) command\_buf[0] = \neg q \neg;
      if (command\_buf[0] \neq 'i') ready = true;
      else {
         command\_buf[strlen(command\_buf) - 1] = '\0':
         incl\_file = fopen(command\_buf + 1, "r");
         if (incl_file) goto incl_read;
         if (isspace(command\_buf[1])) incl\_file = fopen(command\_buf + 2, "r");
         if (incl_file) goto incl_read;
         printf("Can't_lopen_lfile_l'%s'!\n", command_buf + 1);
    }
This code is used in section 149.
       #define command_buf_size 1024
           /* make it plenty long, for floating point tests */
\langle Global variables 19\rangle + \equiv
  char command_buf[command_buf_size];
                    /* file of commands included by 'i' */
  FILE *incl_file:
  char cur\_disp\_mode = 'l'; /* 'l' or 'g' or '$' or 'M' */
  char cur_disp_type = '!'; /* '!' or '.' or '#' or '"' */
  bool cur_disp_set;
                       /* was the last <t> of the form =<val>? */
                         /* the h half is relevant only in mode 'M' */
  octa cur_disp_addr;
                  /* current segment offset */
  octa cur_seg;
  \mathbf{char} \ spec\_req\_code[] = \{rA, rB, rC, rD, rE, rF, rG, rH, rI, rJ, rK, rL, rM, rN, rO, rP,
      rQ, rR, rS, rT, rU, rV, rW, rX, rY, rZ};
  rXX, rYY, rZZ};
```

```
rBB = 7, \S 55.
                                                                          rTT = 14, \S 55.
bool = enum, \S 9.
breakpoint: bool, §61.
                                     rC = 8, \S 55.
                                                                          rU = 17, \S 55.
end_simulation: label, §141.
                                     rD = 1, \S 55.
                                                                          rV = 18, \S 55.
                                                                          rW = 24, \S 55.
false = 0, \S 9.
                                     rE = 2, \S 55.
fclose: int (), <stdio.h>.
                                                                          rWW = 28, \S 55.
                                     rF = 22, \S 55.
fflush: int (), <stdio.h>.
                                     rG = 19, \S 55.
                                                                          rX = 25, \S 55.
fgets: \mathbf{char} *(), < \mathbf{stdio.h} >.
                                     rH = 3, \S 55.
                                                                          rXX = 29, \S 55.
FILE, <stdio.h>.
                                     rI = 12, \S 55.
                                                                          rY = 26, \S 55.
fopen: FILE *(), <stdio.h>.
                                     rJ = 4, §55.
                                                                          rYY = 30, §55.
                                     rK = 15, \S 55.
                                                                          rZ = 27, \S 55.
h: tetra, §10.
                                     rL = 20, \S 55.
interactive_help: char *[],
                                                                          rZZ = 31, \S 55.
                                     rM = 5, \S 55.
                                                                          scan_option: void (), §143.
  §144.
isspace: int (), <ctype.h>.
                                     rN = 9, \S 55.
                                                                          show_stats: void (), §140.
k: register int, §62.
                                     rO = 10, \S 55.
                                                                          stdin: FILE *, <stdio.h>.
                                     rP = 23, \S 55.
octa = struct, \S 10.
                                                                          stdout: FILE *, <stdio.h>.
p: register char *, §62.
                                     rQ = 16, \S 55.
                                                                          strcpy: char *(), <string.h>.
printf: int (), <stdio.h>.
                                     rR = 6, \S 55.
                                                                          strlen: size_t (), <string.h>.
rA = 21, \S 55.
                                     rS = 11, \S 55.
                                                                          tracing: bool, §61.
rB = 0, \S 55.
                                     rT = 13, \S 55.
                                                                          true = 1, \S 9.
```

```
152. \langle \text{ Cases that change } cur\_disp\_mode | 152 \rangle \equiv
case '1': case 'g': case '$': cur\_disp\_mode = *p++;
  for (cur\_disp\_addr.l = 0; isdigit(*p); p++)
     cur\_disp\_addr.l = 10 * cur\_disp\_addr.l + *p - '0':
  goto new_mode:
case 'r': p++; cur\_disp\_mode = 'g';
  if (*p < 'A' \lor *p > 'Z') goto what_say;
  if (*(p+1) \neq *p) cur_disp_addr.l = spec_reg_code[*p - 'A'], p++;
  else if (spec\_regg\_code[*p - 'A']) cur\_disp\_addr.l = spec\_regg\_code[*p - 'A'], p += 2;
  else goto what_say;
  goto new_mode;
case 'M': cur\_disp\_mode = *p;
   cur\_disp\_addr = scan\_hex(p+1, cur\_seq); cur\_disp\_addr.l \&= -8; p = next\_char;
new_mode: cur_disp_set = false: /* the '=' is remembered only by '+' */
  repeating = 1;
  goto scan_type:
case '+': if (\neg isdigit(*(p+1))) repeating = 1;
  for (p++; isdigit(*p); p++) repeating = 10 * repeating + *p - '0';
  if (repeating) {
     if (cur\_disp\_mode \equiv 'M') cur\_disp\_addr = incr(cur\_disp\_addr, 8);
     else cur\_disp\_addr.l++;
  goto scan_type;
This code is used in section 149.
153. (Cases that define cur_disp_type 153) \equiv
case '!': case '.': case '#': case '"': cur_disp_set = false;
  repeating = 1;
set\_type: cur\_disp\_type = *p++; break;
scan\_type: if (*p \equiv '!, \lor *p \equiv ', \lor \lor *p \equiv '#, \lor \lor *p \equiv '") goto set\_type:
  if (*p \neq '=') break;
  goto scan_eql;
case '=': repeating = 1;
scan\_eql: cur\_disp\_set = true;
  val = zero\_octa;
  if (*++p \equiv '\#') cur_disp_type =*p, val = scan\_hex(p+1, zero\_octa);
  else if (*p \equiv "" \lor *p \equiv " \lor") goto scan\_string;
  else cur\_disp\_type = (scan\_const(p) > 0?', ':', ':');
  p = next\_char;
  if (*p \neq ', ') break;
  val.h = 0; val.l \&= #ff;
scan\_string: cur\_disp\_type = "";
  (Scan a string constant 155); break;
This code is used in section 149.
154. \langle \text{Subroutines } 12 \rangle + \equiv
  octa scan\_hex ARGS((char *, octa));
  octa scan\_hex(s, offset)
       char *s:
       octa offset;
```

```
register char *p;
     octa o:
     o = zero\_octa:
     for (p = s; isxdigit(*p); p++) {
        o = incr(shift\_left(o, 4), *p - '0');
        if (*p > 'a') o = incr(o, '0' - 'a' + 10);
        else if (*p > `A') o = incr(o, `O' - `A' + 10);
     }
     next\_char = p;
     return oplus(o, offset);
  }
155. \langle \text{Scan a string constant } 155 \rangle \equiv
  while (*p \equiv ', ')
     if (*++p \equiv '#') {
        aux = scan\_hex(p+1, zero\_octa), p = next\_char;
        val = incr(shift\_left(val, 8), aux.l \& #ff);
     } else if (isdigit(*p)) {
        for (k = *p++ - '0'; isdigit(*p); p++) k = (10 * k + *p - '0') \& #ff;
        val = incr(shift\_left(val, 8), k);
     else if (*p \equiv ' \ ') goto incomplete\_str;
  if (*p \equiv ' \setminus ', \land *(p+2) \equiv *p) *p = *(p+2) = '"';
  if (*p \equiv ")
     for (p++; *p \land *p \neq `\n', \land *p \neq `"'; p++) val = incr(shift\_left(val, 8), *p);
     if (*p \land *p++ \equiv "")
        if (*p \equiv ', ') goto scan\_string;
```

This code is used in section 153.

```
ARGS = macro(), §11.
                                  isdigit: int (), <ctype.h>.
                                                                      mmix-arith §68.
aux: octa, MMIX-ARITH §4.
                                  isxdigit: int (), <ctype.h>.
                                                                    shift_left: octa (),
cur\_disp\_addr: octa, §151.
                                  k: register int, §62.
                                                                      MMIX-ARITH §7.
cur\_disp\_mode: char, §151.
                                  l: tetra, §10.
                                                                    spec\_reg\_code: char [], §151.
cur_disp_set: bool, §151.
                                  next\_char: char *,
                                                                    spec_regg_code: char [], §151.
cur\_disp\_type: char, §151.
                                   MMIX-ARITH §69.
                                                                    true = 1, \S 9.
cur\_seg: octa, §151.
                                  octa = struct, §10.
                                                                    val: octa, MMIX-ARITH §69.
false = 0, \S 9.
                                  oplus: octa (), MMIX-ARITH §5. what_say: label, §149.
h: tetra, §10.
                                  p: register char *, §62.
                                                                   zero_octa: octa,
incomplete_str: label, §149.
                                  repeating: register int, §149.
                                                                    MMIX-ARITH §4.
incr: octa (), MMIX-ARITH §6.
                                  scan_const: int (),
```

```
156.
         \langle \text{Display and/or set the value of the current octabyte } 156 \rangle \equiv
     if (cur_disp_set) \langle Set the current octabyte to val 157\rangle;
     ⟨Display the current octabyte 159⟩;
     fputc('\n', stdout);
     repeating --;
     if (\neg repeating) break;
     if (cur\_disp\_mode \equiv 'M') cur\_disp\_addr = incr(cur\_disp\_addr, 8);
     else cur\_disp\_addr.l++;
This code is used in section 149.
         \langle Set the current octabyte to val 157\rangle \equiv
  switch (cur_disp_mode) {
  case '1': l[cur\_disp\_addr.l \& lring\_mask] = val; break;
  case '$': k = cur\_disp\_addr.l \& #ff;
     if (k < L) l[(O + k) \& lring\_mask] = val; else if (k > G) q[k] = val;
     break;
  case 'g': k = cur\_disp\_addr.l \& #ff;
     if (k < 32) (Set q[k] = val only if permissible 158);
     q[k] = val; break;
  case 'M': if (\neg(cur\_disp\_addr.h \& sign\_bit)) {
        ll = mem\_find(cur\_disp\_addr);
        ll \rightarrow tet = val.h; (ll + 1) \rightarrow tet = val.l;
      } break;
   }
This code is used in section 156.
       Here we essentially simulate a PUT command, but we simply break if the PUT
is illegal or privileged.
\langle \text{ Set } q[k] = val \text{ only if permissible } 158 \rangle \equiv
  if (k > 9 \land k \neq rI) {
     if (k \le 19) break;
     if (k \equiv rA) {
        if (val.h \neq 0 \lor val.l \ge {}^{\#}40000) break;
        cur\_round = (val.l \ge \text{\#10000 ? } val.l \gg 16 : \texttt{ROUND\_NEAR});
      } else if (k \equiv rG) {
        if (val.h \neq 0 \lor val.l > 255 \lor val.l < L \lor val.l < 32) break;
        for (j = val.l; j < G; j \leftrightarrow) g[j] = zero\_octa;
        G = val.l;
      } else if (k \equiv rL) {
        if (val.h \equiv 0 \land val.l < L) L = val.l;
        else break;
This code is used in section 157.
```

```
\langle \text{ Display the current octabyte } 159 \rangle \equiv
  switch (cur_disp_mode) {
  case '1': k = cur\_disp\_addr.l \& lring\_mask;
     printf("1[%d]=",k); \ aux = l[k]; \ break;
  case '$': k = cur\_disp\_addr.l \& #ff;
     if (k < L)
        printf("\$\%d=1[\%d]=", k, (O+k) \& lring\_mask), aux = l[(O+k) \& lring\_mask];
     else if (k \ge G) printf ("$\d=g[\d]=", k, k), aux = g[k];
     else printf("$\%d=",k), aux = zero\_octa;
     break:
  case 'g': k = cur\_disp\_addr.l \& #ff;
     printf("g[\%d]=",k); \ aux = g[k]; \ break;
  case 'M': if (cur\_disp\_addr.h \& sign\_bit) aux = zero\_octa;
     else {
        ll = mem\_find(cur\_disp\_addr);
        aux.h = ll \rightarrow tet; \ aux.l = (ll + 1) \rightarrow tet;
     printf("M8[#"); print_hex(cur_disp_addr); printf("]="); break;
  switch (cur_disp_type) {
  case '!': print_int(aux); break;
  case '.': print_float(aux); break;
  case '#': fputc('#', stdout); print_hex(aux); break;
  case '"': print_string(aux); break;
This code is used in section 156.
```

```
aux: octa, MMIX-ARITH §4.
                                  l: tetra, §10.
                                                                     printf: int (), <stdio.h>.
cur\_disp\_addr: octa, §151.
                                  l: octa *, §76.
                                                                     rA = 21, \S 55.
cur\_disp\_mode: char, §151.
                                  L: register int, \S75.
                                                                     repeating: register int, §149.
                                  ll: register mem_tetra *,
                                                                     rG = 19, \S 55.
cur_disp_set: bool, §151.
cur\_disp\_type: char, §151.
                                     §62.
                                                                     rI = 12, \S 55.
cur_round: int,
                                  lring\_mask: int, \S 76.
                                                                     rL = 20, \S 55.
  MMIX-ARITH §30.
                                  mem\_find: mem\_tetra *(),
                                                                     ROUND_NEAR = 4, §100.
                                    \S 20.
fputc: int (), <stdio.h>.
                                                                     sign\_bit = macro, \S 15.
                                  O: register int, §75.
g: octa [], §76.
                                                                     stdout: FILE *, <stdio.h>.
G: register int, §75.
                                  print_float: void (),
                                                                     tet: tetra, §16.
h: tetra, §10.
                                    MMIX-ARITH §54.
                                                                     val: octa, MMIX-ARITH §69.
                                                                     zero_octa: octa,
incr: octa (), MMIX-ARITH §6.
                                  print\_hex: void (), §12.
j: register int, §62.
                                  print_int: void (), §15.
                                                                       MMIX-ARITH §4.
k: register int, §62.
                                  print_string: void (), §160.
```

```
160.
        \langle Subroutines 12\rangle + \equiv
  void print_string ARGS((octa));
  void print_string(o)
        octa o:
     register int k, state, b;
     for (k = state = 0; k < 8; k++) {
        b = ((k < 4? o.h \gg (8*(3-k)) : o.l \gg (8*(7-k)))) \& ^{\#}ff:
        if (b \equiv 0) {
           if (state) printf("%s,0", state > 1? "\"": ""), state = 1;
        } else if (b > , , \land b < , \land ,)
           printf("%s%c", state > 1?"": state \equiv 1?", `\"": "\"", b), state = 2;
        else printf("\%s\#\%x", state > 1?"\",":state \equiv 1?",":"",b), state = 1;
     if (state \equiv 0) printf("0");
     else if (state > 1) printf("\"");
  }
161. (Cases that set and clear tracing and breakpoints 161) \equiv
case '0': inst\_ptr = scan\_hex(p+1, cur\_seq); p = next\_char;
  halted = false; break;
case 't': case 'u': k = *p;
  val = scan\_hex(p + 1, cur\_seg); p = next\_char;
  if (val.h < #2000000) {
     ll = mem\_find(val);
     if (k \equiv 't') ll \rightarrow bkpt \mid = trace\_bit;
     else ll \rightarrow bkpt \&= \sim trace\_bit;
  break;
case 'b': for (k = 0, p++; \neg isxdigit(*p); p++)
     if (*p \equiv 'r') k = read\_bit;
     else if (*p \equiv `w") k = write\_bit;
     else if (*p \equiv 'x') k = exec\_bit;
  val = scan\_hex(p, cur\_seg); p = next\_char;
  if (\neg(val.h \& sign\_bit)) {
     ll = mem\_find(val);
     ll \rightarrow bkpt = (ll \rightarrow bkpt \& -8) \mid k;
  break;
case 'T': cur\_seg.h = 0; goto passit;
case 'D': cur\_seg.h = {}^{\#}20000000; goto passit;
case 'P': cur\_seq.h = {}^{\#}40000000; goto passit;
case 'S': cur\_seg.h = {}^{\#}60000000; goto passit;
case 'B': show_breaks(mem_root);
passit: p \leftrightarrow ; break;
This code is used in section 149.
```

```
162.
         \langle Subroutines 12\rangle + \equiv
   void show_breaks ARGS((mem_node *));
   void show_breaks(p)
         mem\_node *p;
      register int j;
      octa cur_loc;
      if (p \rightarrow left) show_breaks (p \rightarrow left);
      for (i = 0; i < 512; i++)
         if (p \rightarrow dat[j].bkpt) {
            cur\_loc = incr(p \rightarrow loc, 4 * j);
            printf("_{\sqcup\sqcup}\%08x\%08x_{\sqcup}\%c\%c\%c\%c\%n", cur\_loc.h, cur\_loc.l,
                  p \rightarrow dat[j].bkpt \& trace\_bit ? 't' : '-', p \rightarrow dat[j].bkpt \& read\_bit ? 'r' : '-',
                  p \rightarrow dat[j].bkpt \& write\_bit ? `w' : `-`, p \rightarrow dat[j].bkpt \& exec\_bit ? `x' : `-`);
      if (p \rightarrow right) show_breaks (p \rightarrow right);
         We put pointers to the command line strings in octabytes M<sub>8</sub>[Pool_Segment+
8*(k+1) for 0 \le k < argc; the strings themselves are octabyte-aligned, starting at
M_8[Pool\_Segment + 8*(argc + 2)]. The location of the first free octabyte in the pool
segment is placed in M<sub>8</sub>[Pool_Segment].
\langle Load the command line arguments 163\rangle \equiv
   x.h = {}^{\#}40000000, x.l = {}^{\#}8;
   loc = incr(x, 8 * (argc + 1));
   for (k = 0; k < argc; k++, cur\_arg++) {
      ll = mem\_find(x);
      ll \rightarrow tet = loc.h, (ll + 1) \rightarrow tet = loc.l;
      ll = mem\_find(loc);
      mmputchars((unsigned char *) *cur\_arg, strlen(*cur\_arg), loc);
      x.l += 8, loc.l += 8 + (strlen(*cur\_arg) \& -8);
   x.l = 0; ll = mem\_find(x); ll \rightarrow tet = loc.h, (ll + 1) \rightarrow tet = loc.l;
This code is used in section 141.
```

```
argc: int, §141.
                                    l: tetra, §10.
                                                                         octa = struct, \S 10.
ARGS = macro(), \S 11.
                                    left: \mathbf{mem\_node} *, \S 16.
                                                                         p: register char *, §62.
bkpt: unsigned char, §16.
                                    ll: register mem_tetra *.
                                                                         printf: int (), <stdio.h>.
cur_arg: char **, §144.
                                                                         read\_bit = macro, \S 58.
                                       §62.
                                    loc: octa, §16.
cur_seg: octa, §151.
                                                                         right: \mathbf{mem\_node} *, §16.
dat \colon \mathbf{mem\_tetra} \ [ \ ], \ \S 16.
                                    loc: octa, §61.
                                                                         scan_hex: octa (), §154.
                                                                         sign\_bit = macro, \S 15.
exec\_bit = macro, \S 58.
                                    mem\_find: mem\_tetra *(),
false = 0, \S 9.
                                                                         strlen: size_t (), <string.h>.
h: tetra, §10.
                                    mem\_node = struct, \S 16.
                                                                         tet: tetra, §16.
halted: bool, §61.
                                    mem\_root: mem\_node *, §19.
                                                                         trace\_bit = macro, \S 58.
incr: octa (), MMIX-ARITH §6.
                                    mmputchars: void (), §117.
                                                                         val: octa, MMIX-ARITH §69.
inst\_ptr: octa, §61.
                                    next_char: char *,
                                                                         write\_bit = macro, \S 58.
isxdigit: int (), <ctype.h>.
                                      MMIX-ARITH §69.
                                                                        x: octa, §61.
k: register int, §62.
```

```
164. \langle Get ready to UNSAVE the initial context 164\rangle \equiv x.h = 0, x.l = \text{\#fO}; ll = mem\_find(x); if (ll \neg tet) \ inst\_ptr = x; resuming = true; rop = \text{RESUME\_AGAIN}; g[rX].l = ((tetra) \ UNSAVE \ll 24) + 255; if (dump\_file) { x.l = 1; dump(mem\_root); dump\_tet(0), dump\_tet(0); exit(0); }
```

This code is used in section 141.

165. The special option '-D<filename>' can be used to prepare binary files needed by the MMIX-in-MMIX simulator of Section 1.4.3'. (See *The Art of Computer Programming*, Volume 1, Fascicle 1.) This option puts big-endian octabytes into a given file; a location l is followed by one or more nonzero octabytes  $M_8[l]$ ,  $M_8[l+8]$ ,  $M_8[l+16]$ , ..., followed by zero. The simulated simulator knows how to load programs in such a format (see exercise 1.4.3′-20), and so does the meta-simulator MMMIX.

```
\langle \text{Subroutines } 12 \rangle + \equiv
   void dump ARGS((mem_node *));
   void dump_tet ARGS((tetra));
   void dump(p)
         mem\_node *p:
      register int j;
      octa cur_loc;
      if (p \rightarrow left) dump(p \rightarrow left);
      for (j = 0; j < 512; j += 2)
         if (p \rightarrow dat[j].tet \lor p \rightarrow dat[j+1].tet) {
            cur\_loc = incr(p \neg loc, 4 * j);
            if (cur\_loc.l \neq x.l \lor cur\_loc.h \neq x.h) {
               if (x.l \neq 1) dump\_tet(0), dump\_tet(0);
               dump\_tet(cur\_loc.h); dump\_tet(cur\_loc.l); x = cur\_loc;
            dump\_tet(p \rightarrow dat[j].tet);
            dump\_tet(p \neg dat[j+1].tet);
            x = incr(x, 8);
      if (p \rightarrow right) dump(p \rightarrow right);
```

```
166. \langle Subroutines 12\rangle +\equiv void dump\_tet(t) tetra t; {

fputc(t \gg 24, dump\_file);
fputc((t \gg 16) \& \#ff, dump\_file);
fputc((t \gg 8) \& \#ff, dump\_file);
fputc(t \& \#ff, dump\_file);
}
```

```
ARGS = macro (), §11.
dat: mem_tetra [], §16.
dump_file: FILE *, §144.
exit: void (), <stdlib.h>.
fputc: int (), <stdlib.h>.
g: octa [], §76.
h: tetra, §10.
inst_ptr: octa (), MMIX-ARITH §6.
inst_ptr: octa, §61.
l: tetra, §10.
```

```
\label{eq:left:mem_node} \begin{array}{l} \textit{left:} \ \mathbf{mem\_node} \ *, \S 16. \\ \textit{ll:} \ \mathbf{register} \ \mathbf{mem\_tetra} \ *, \\ \S 62. \\ \textit{loc:} \ \mathbf{octa}, \S 16. \\ \textit{mem\_find:} \ \mathbf{mem\_tetra} \ *(\ ), \\ \S 20. \\ \mathbf{mem\_node} = \mathbf{struct}, \S 16. \\ \textit{mem\_node} = \mathbf{struct}, \S 16. \\ \textit{mem\_root:} \ \mathbf{mem\_node} \ *, \S 19. \\ \mathbf{octa} = \mathbf{struct}, \S 10. \\ \mathbf{RESUME\_AGAIN} = 0, \ \S 125. \\ \end{array}
```

```
resuming: bool, §61.

right: mem_node *, §16.

rop: int, §61.

rX = 25, §55.

tet: tetra, §16.

tetra = unsigned int, §10.

true = 1, §9.

UNSAVE = #fb, §54.

x: octa, §61.
```

## 167. Names of the sections.

```
(Cases for formatting characters 134, 136, 138) Used in section 133.
Cases for individual MMIX instructions 84, 85, 86, 87, 88, 89, 90, 92, 93, 94, 95, 96, 97, 101,
  102, 104, 106, 107, 108, 124 \ Used in section 60.
(Cases for lopcodes in the main loop 33, 34, 35, 36) Used in section 29.
 Cases that change cur\_disp\_mode 152 \ Used in section 149.
 Cases that define cur\_disp\_type 153 \ Used in section 149.
 Cases that set and clear tracing and breakpoints 161 \ Used in section 149.
 Check for trip interrupt 122 \ Used in section 60.
 Check if the source file has been modified 44 \ Used in section 42.
 Convert relative address to absolute address 70 \ Used in section 60.
 Display and/or set the value of the current octabyte 156 \ Used in section 149.
 Display the current octabyte 159 \ Used in section 156.
 Either halt or print warning 109 \ Used in section 108.
 Fetch the next instruction 63 \ Used in section 60.
 Fix up the subtrees of *q 22 \ Used in section 21.
 Get ready to UNSAVE the initial context 164 \ Used in section 141.
 Get ready to update rA 100 \ Used in section 97.
(Get ready to update rG 99) Used in section 97.
(Global variables 19, 25, 31, 40, 48, 52, 56, 61, 65, 76, 110, 113, 121, 129, 139, 144, 151) Used
  in section 141.
(Increase rL 81) Used in section 80.
(Info for arithmetic commands 66) Used in section 65.
(Info for branch commands 67) Used in section 65.
(Info for load/store commands 68) Used in section 65.
(Info for logical and control commands 69) Used in section 65.
(Initialize everything 14, 18, 24, 32, 41, 77, 147) Used in section 141.
(Initiate a trip interrupt 123) Used in section 122.
(Install operand fields 71) Used in section 60.
(Install register X as the destination, adjusting the register stack if necessary 80)
  Used in section 60.
(Install special operands when resuming an interrupted operation 126) Used in
  section 71.
(Interact with the user 149) Used in section 141.
\langle Interpret character *p in the trace format 133\rangle Used in section 131.
(Load and write four bytes 119) Used in section 117.
(Load and write one byte 118) Used in section 117.
\langle \text{Load } g[k] \text{ from the register stack } 105 \rangle Used in section 104.
(Load tet as a normal item 30) Used in section 29.
(Load the command line arguments 163) Used in section 141.
(Load the next item 29) Used in section 32.
(Load the postamble 37) Used in section 32.
(Load the preamble 28) Used in section 32.
(Local registers 62, 75) Used in section 141.
Open a file for dumping binary output 146 Used in section 143.
```

```
(Open a file for simulated standard input 145) Used in section 143.
(Perform one instruction 60) Used in section 141.
\langle Prepare memory arguments ma = M[a] and mb = M[b] if needed 111 \rangle Used in
  section 108.
(Prepare to list lines from a new source file 49) Used in section 47.
(Prepare to perform a ropcode 125) Used in section 124.
Preprocessor macros 11, 43, 46 Used in section 141.
(Print a stream-of-consciousness description of the instruction 131) Used in sec-
  tion 128.
(Print all the frequency counts 53) Used in section 141.
(Print changes to rL 132) Used in section 131.
\langle \text{ Print frequency data for location } p \rightarrow loc + 4 * j 51 \rangle Used in section 50.
(Print the frequency count, the location, and the instruction 130) Used in sec-
  tion 128.
(Process the command line 142) Used in section 141.
(Put a new command in command_buf 150) Used in section 149.
(Read and store one byte; return if done 115) Used in section 114.
Read and store up to four bytes; return if done 116 Used in section 114.
(Scan a string constant 155) Used in section 153.
(Search for key in the treap, setting last_mem and p to its location 21) Used in
  section 20.
\langle Set b from register X 74\rangle Used in section 71.
 Set b from special register 79 \rangle Used in section 71.
 Set q[k] = val only if permissible 158 \rangle Used in section 157.
 Set L = z = \min(z, L) 98 \ Used in section 97.
 Set the current octabyte to val 157 \ Used in section 156.
Set y from register Y 73 \rangle Used in section 71.
 Set z as an immediate wyde 78 \rangle Used in section 71.
 Set z from register Z 72 V Used in section 71.
 Store g[k] in the register stack 103 \rightarrow Used in section 102.
Subroutines 12, 13, 15, 17, 20, 26, 27, 42, 45, 47, 50, 82, 83, 91, 114, 117, 120, 137, 140, 143, 148,
  154, 160, 162, 165, 166 \ Used in section 141.
Trace the current instruction, if requested 128 Used in section 60.
(Type declarations 9, 10, 16, 38, 39, 54, 55, 59, 64, 135) Used in section 141.
(Update the clocks 127) Used in section 60.
```

1. Definition of MMIXAL. This program takes input written in MMIXAL, the MMIX assembly language, and translates it into binary files that can be loaded and executed on MMIX simulators. MMIXAL is much simpler than the "industrial strength" assembly languages that computer manufacturers usually provide, because it is primarily intended for the simple demonstration programs in *The Art of Computer Programming*. Yet it tries to have enough features to serve also as the back end of compilers for C and other high-level languages.

Instructions for using the program appear at the end of this document (see page 487). First we will discuss the input and output languages in detail; then we'll consider the translation process, step by step; then we'll put everything together.

2. A program in MMIXAL consists of a series of *lines*, each of which usually contains a single instruction. However, lines with no instructions are possible, and so are lines with two or more instructions.

Each instruction has three parts called its label field, opcode field, and operand field; these fields are separated from each other by one or more spaces. The label field, which is often empty, consists of all characters up to the first blank space. The opcode field, which is never empty, runs from the first nonblank after the label to the next blank space. The operand field, which again might be empty, runs from the next nonblank character (if any) to the first blank or semicolon that isn't part of a string or character constant. If the operand field is followed by a semicolon, possibly with intervening blanks, a new instruction begins immediately after the semicolon; otherwise the rest of the line is ignored. The end of a line is treated as a blank space for the purposes of these rules, with the additional proviso that string or character constants are not allowed to extend from one line to another.

The label field must begin with a letter or a digit; otherwise the entire line is treated as a comment. Popular ways to introduce comments, either at the beginning of a line or after the operand field, are to precede them by the character % as in TeX, or by // as in C++; MMIXAL is not very particular. However, Lisp-style comments introduced by single semicolons will fail if they follow an instruction, because they will be assumed to introduce another instruction.

3. MMIXAL has no built-in macro capability, nor does it know how to include header files and such things. But users can run their files through a standard C preprocessor to obtain MMIXAL programs in which macros and such things have been expanded. (Caution: The preprocessor also removes C-style comments, unless it is told not to do so.) Literate programming tools could also be used for preprocessing.

If a line begins with the special form '#  $\langle$  integer  $\rangle$   $\langle$  string  $\rangle$ ', this program interprets it as a *line directive* emitted by a preprocessor. For example,

## # 13 "foo.mms"

means that the following line was line 13 in the user's source file foo.mms. Line directives allow us to correlate errors with the user's original file; we also pass them to the output, for use by simulators and debuggers.

**4.** MMIXAL deals primarily with *symbols* and *constants*, which it interprets and combines to form machine language instructions and data. Constants are simplest, so we will discuss them first.

A decimal constant is a sequence of digits, representing a number in radix 10. A hexadecimal constant is a sequence of hexadecimal digits, preceded by #, representing a number in radix 16:

```
 \begin{array}{l} \langle \operatorname{digit} \rangle \longrightarrow 0 \mid 1 \mid 2 \mid 3 \mid 4 \mid 5 \mid 6 \mid 7 \mid 8 \mid 9 \\ \langle \operatorname{hex\ digit} \rangle \longrightarrow \langle \operatorname{digit} \rangle \mid A \mid B \mid C \mid D \mid E \mid F \mid a \mid b \mid c \mid d \mid e \mid f \\ \langle \operatorname{decimal\ constant} \rangle \longrightarrow \langle \operatorname{digit} \rangle \mid \langle \operatorname{decimal\ constant} \rangle \langle \operatorname{digit} \rangle \\ \langle \operatorname{hex\ constant} \rangle \longrightarrow \# \langle \operatorname{hex\ digit} \rangle \mid \langle \operatorname{hex\ constant} \rangle \langle \operatorname{hex\ digit} \rangle \\ \end{array}
```

Constants whose value is  $2^{64}$  or more are reduced modulo  $2^{64}$ .

5. A character constant is a single character enclosed in single quote marks; it denotes the ASCII or Unicode number corresponding to that character. For example, 'a' represents the constant #61, also known as 97. The quoted character can be anything except the character that the C library calls \n or newline; that character should be represented as #a.

```
\label{eq:character except newline} $$ \langle \operatorname{character except newline} \rangle ' \langle \operatorname{constant} \rangle \longrightarrow \langle \operatorname{decimal constant} \rangle \mid \langle \operatorname{hex constant} \rangle \mid \langle \operatorname{character constant} \rangle $$
```

Notice that ''' represents a single quote, the code #27; and '\' represents a backslash, the code #5c. MMIXAL characters are never "quoted" by backslashes as in the C language.

In the present implementation a character constant will always be at most 255, since wyde character input is not supported. But if the input were in Unicode one could write, say, 'R' or 'X' for #05d0 or #0416. The present program does not support Unicode directly because basic software for inputting and outputting 16-bit characters was still in a primitive state at the time of writing. But the data structures below are designed so that a change to Unicode will not be difficult when the time is ripe.

6. A string constant like "Hello" is an abbreviation for a sequence of one or more character constants separated by commas: 'H','e','l','l','o'. Any character except newline or the double quote mark " can appear between the double quotes of a string constant. Similarly, "高德纳" is an abbrevation for '高','微','纳' (namely #9ad8,#5fb7,#7eb3) when Unicode is supported.

7. A symbol in MMIXAL is any sequence of letters and digits, beginning with a letter. A colon ':' or underscore symbol '\_' is regarded as a letter, for purposes of this definition. All extended-ASCII characters like 'é', whose 8-bit code exceeds 126, are also treated as letters.

```
 \begin{array}{l} \langle \, \text{letter} \, \rangle \longrightarrow \texttt{A} \mid \texttt{B} \mid \cdots \mid \texttt{Z} \mid \texttt{a} \mid \texttt{b} \mid \cdots \mid \texttt{z} \mid : \mid \_ \mid \langle \, \text{character with code value} > 126 \, \rangle \\ \langle \, \text{symbol} \, \rangle \longrightarrow \langle \, \text{letter} \, \rangle \mid \langle \, \text{symbol} \, \rangle \langle \, \text{digit} \, \rangle \\ \end{array}
```

In future implementations, when MMIXAL is used with Unicode, all wyde characters whose 16-bit code exceeds 126 will be regarded as letters; thus MMIXAL symbols will be able to involve Greek letters or Chinese characters or thousands of other glyphs.

8. A symbol is said to be *fully qualified* if it begins with a colon. Every symbol that is not fully qualified is an abbreviation for the fully qualified symbol obtained by placing the *current prefix* in front of it; the current prefix is always fully qualified. At the beginning of an MMIXAL program the current prefix is simply the single character ':', but the user can change it with the PREFIX command. For example,

```
ADD
                           % means ADD :x,:y,:z
        x,y,z
PREFIX
                           % current prefix is :Foo:
       Foo:
                           % means ADD :Foo:x,:Foo:y,:Foo:z
ADD
        x,y,z
                           % current prefix is :Foo:Bar:
PREFIX Bar:
                           % means ADD :x,:Foo:Bar:y,:z
ADD
        :x,y,:z
PREFIX
                           % current prefix reverts to :
ADD
        x,Foo:Bar:y,Foo:z % means ADD :x,:Foo:Bar:y,:Foo:z
```

This mechanism allows large programs to avoid conflicts between symbol names, when parts of the program are independent and/or written by different users. The current prefix conventionally ends with a colon, but this convention need not be obeyed.

**9.** A *local symbol* is a decimal digit followed by one of the letters B, F, or H, meaning "backward," "forward," or "here":

```
\begin{array}{l} \langle \operatorname{local\ operand} \rangle \longrightarrow \langle \operatorname{digit} \rangle \operatorname{B} \mid \langle \operatorname{digit} \rangle \operatorname{F} \\ \langle \operatorname{local\ label} \rangle \longrightarrow \langle \operatorname{digit} \rangle \operatorname{H} \end{array}
```

The B and F forms are permitted only in the operand field of MMIXAL instructions; the H form is permitted only in the label field. A local operand such as 2B stands for the last local label 2H in instructions before the current one, or 0 if 2H has not yet appeared as a label. A local operand such as 2F stands for the first 2H in instructions after the current one. Thus, in a sequence such as

the first instruction jumps to the second and the second jumps to the first.

Local symbols are useful for references to nearby points of a program, in cases where no meaningful name is appropriate. They can also be useful in special situations where a redefinable symbol is needed; for example, an instruction like

will maintain a running counter.

10. Each symbol receives a value called its *equivalent* when it appears in the label field of an instruction; it is said to be *defined* after its equivalent has been established. A few symbols, like rA and ROUND\_OFF and Fopen, are predefined because they refer to fixed constants associated with the MMIX hardware or its rudimentary operating system; otherwise every symbol should be defined exactly once. The two appearances of '2H' in the example above do not violate this rule, because the second '2H' is not the same symbol as the first.

A predefined symbol can be redefined (given a new equivalent). After it has been redefined it acts like an ordinary symbol and cannot be redefined again. A complete list of the predefined symbols appears in the program listing below.

Equivalents are either *pure* or *register numbers*. A pure equivalent is an unsigned octabyte, but a register number equivalent is a one-byte value, between 0 and 255. A dollar sign is used to change a pure number into a register number; for example, '\$20' means register number 20.

11. Constants and symbols are combined into expressions in a simple way:

Each expression has a value that is either pure or a register number. The character @ stands for the current location, which is always pure. The unary operators +, -, ~, \$, and & mean, respectively, "do nothing," "subtract from zero," "complement the bits," "change from pure value to register number," and "take the serial number." Only the first of these, +, can be applied to a register number. The last unary operator, &, applies only to symbols, and it is of interest primarily to system programmers; it converts a symbol to the unique positive integer that is used to identify it in the binary file output by MMIXAL.

Binary operators come in two flavors, strong and weak. The strong ones are essentially concerned with multiplication or division:  $\mathbf{x}*\mathbf{y}$ ,  $\mathbf{x}/\mathbf{y}$ , and  $\mathbf{x}$  stand respectively for  $(x \times y) \mod 2^{64}$  (multiplication),  $\lfloor x/y \rfloor$  (division),  $\lfloor 2^{64}x/y \rfloor$  (fractional division),  $x \mod y$  (remainder),  $(x \times 2^y) \mod 2^{64}$  (left shift),  $\lfloor x/2^y \rfloor$  (right shift), and x & y (bitwise and) on unsigned octabytes. Division is legal only if y > 0; fractional division is legal only if x < y. None of the strong binary operations can be applied to register numbers.

The weak binary operations x+y, x-y, x|y, and x^y stand respectively for  $(x+y) \mod 2^{64}$  (addition),  $(x-y) \mod 2^{64}$  (subtraction),  $x \mid y$  (bitwise or), and  $x \oplus y$  (bitwise exclusive-or) on unsigned octabytes. These operations can be applied to register numbers only in four contexts:  $\langle \operatorname{register} \rangle + \langle \operatorname{pure} \rangle$ ,  $\langle \operatorname{pure} \rangle + \langle \operatorname{register} \rangle$ ,  $\langle \operatorname{register} \rangle - \langle \operatorname{pure} \rangle$  and  $\langle \operatorname{register} \rangle - \langle \operatorname{register} \rangle$ . For example, if x denotes \$1 and y denotes \$10, then x+3 and 3+x denote \$4, and y-x denotes the pure value 9.

Register numbers within expressions are allowed to be arbitrary octabytes, but a register number assigned as the equivalent of a symbol should not exceed 255.

(Incidentally, one might ask why the designer of MMIXAL did not simply adopt the existing rules of C for expressions. The primary reason is that the designers of C chose to give <<, >>, and & a lower precedence than +; but in MMIXAL we want to be able to write things like o<<24+x<<16+y<<8+z or @+yz<<2 or @+(#100-@)&#ff. Since the conventions of C were inappropriate, it was better to make a clean break, not pretending to have a close relationship with that language. The new rules are quite easily memorized, because MMIXAL has just two levels of precedence, and the strong binary operations are all essentially multiplicative by nature while the weak binary operations are essentially additive.)

12. A symbol is called a *future reference* until it has been defined. MMIXAL restricts the use of future references, so that programs can be assembled quickly in one pass over the input; therefore all expressions can be evaluated when the MMIXAL processor first sees them.

The restrictions are easily stated: Future references cannot be used in expressions together with unary or binary operators (except the unary +, which does nothing); moreover, future references can appear as operands only in instructions that have relative addresses (namely branches, probable branches, JMP, PUSHJ, GETA) or in octabyte constants (the pseudo-operation OCTA). Thus, for example, one can say JMP 1F or JMP 1B-4, but not JMP 1F-4.

13. We noted earlier that each MMIXAL instruction contains a label field, an opcode field, and an operand field. The label field is either empty or a symbol or local label; when it is nonempty, the symbol or local label receives an equivalent. The operand field is either empty or a sequence of expressions separated by commas; when it is empty, it is equivalent to the simple operand field '0'.

The opcode field contains either a symbolic MMIX operation name (like ADD), or an *alias operation*, or a *pseudo-operation*. Alias operations are alternate names for MMIX operations whose standard names are inappropriate in certain contexts. Pseudo-operations do not correspond directly to MMIX commands, but they govern the assembly process in important ways.

There are two alias operations:

- SET \$X,\$Y is equivalent to OR \$X,\$Y,0; it sets register X to register Y. Similarly, SET \$X,Y (when Y is not a register) is equivalent to SETL \$X,Y.
- LDA X,Y,Z is equivalent to ADDU X,Y,Z; it loads the address of memory location Y+Z into register X. Similarly, LDA X,Y,Z is equivalent to ADDU X,Y,Z.

The symbolic operation names for genuine MMIX operations should not include the suffix I for an immediate operation or the suffix B for a backward jump; MMIXAL determines such things automatically. Thus, one never writes ADDI or JMPB in the source input to MMIXAL, although such opcodes might appear when a simulator or debugger or disassembler is presenting a numeric instruction in symbolic form.

14. MMIX operations like ADD require exactly three expressions as operands. The first two must be register numbers. The third must be either a register number or a pure number between 0 and 255; in the latter case, ADD becomes ADDI in the assembled output. Thus, for example, the command "set register 1 to the sum of register 2 and register 3" could be expressed as

ADD \$1,\$2,\$3

or as, say,

ADD x,y,y+1

if the equivalent of x is \$1 and the equivalent of y is \$2. The command "subtract 5 from register 1" could be expressed as

SUB \$1,\$1,5

or as

SUB x,x,5

but not as 'SUBI \$1,\$1,5' or 'SUBI x,x,5'.

MMIX operations like FLOT require either three operands (register, pure, register/pure) or only two (register, register/pure). In the first case the middle operand is the rounding mode, which is best expressed in terms of the predefined symbolic values ROUND\_CURRENT, ROUND\_OFF, ROUND\_UP, ROUND\_DOWN, ROUND\_NEAR, for (0,1,2,3,4) respectively. In the second case the middle operand is understood to be zero (namely, ROUND\_CURRENT).

MMIX operations like SETL or INCH, which involve a wyde intermediate constant, require exactly two operands, (register, pure). The value of the second operand should fit in two bytes.

MMIX operations like BNZ, which mention a register and a relative address, also require two operands. The first operand should be a register number. The second operand should yield a result r in the range  $-2^{16} \le r < 2^{16}$  when the current location is subtracted from it and the result is divided by 4. The second operand might also be undefined; in that case, the eventual value must satisfy the restriction stated for defined values. The opcodes GETA and PUSHJ are similar, except that the first operand to PUSHJ might also be pure (see below). The JMP operation is also similar, but it has only one operand, and it allows the larger address range  $-2^{24} \le r < 2^{24}$ .

MMIX operations that refer to memory, like LDO and STHT and GO, are treated like ADD if they have three operands, except that the first operand should be pure (not a register number) in the case of PRELD, PREGO, PREST, STCO, SYNCD, and SYNCID. These opcodes also accept a special two-operand form, in which the second operand stands for a base address and an immediate offset (see below).

The first operand of PUSHJ and PUSHGO can be either a pure number or a register number. In the first case ('PUSHJ 2,Sub' or 'PUSHGO 2,Sub') the programmer might be thinking "let's push down two registers"; in the second case ('PUSHJ \$2,Sub' or 'PUSHGO \$2,Sub') the programmer might be thinking "let's make register 2 the hole position for this subroutine call." Both cases result in the same assembled output.

The remaining MMIX opcodes are idiosyncratic:

```
NEG r,p,z;
PUT s,z;
GET r,s;
POP p,yz;
RESUME xyz;
SAVE r,0;
UNSAVE r;
SYNC xyz;
TRAP x,y,z or TRAP x,yz or TRAP xyz;
```

SWYM and TRIP are like TRAP. Here s is an integer between 0 and 31, preferably given by one of the predefined symbols rA, rB, ... for special register codes; r is a register number; p is a pure byte; x, y, and z are either register numbers or pure bytes; yz and xyz are pure values that fit respectively in two and three bytes.

All of these rules can be summarized by saying that MMIXAL treats each MMIX opcode in the most natural way. When there are three operands, they affect fields X, Y, and Z of the assembled MMIX instruction; when there are two operands, they affect fields X and YZ; when there is just one operand, it affects field XYZ.

15. In all cases when the opcode corresponds to an MMIX operation, the MMIXAL instruction tells the assembler to carry out four steps: (1) Align the current location so that it is a multiple of 4, by adding 1, 2, or 3 if necessary; (2) Define the equivalent of the label field to be the current location, if the label is nonempty; (3) Evaluate the operands and assemble the specified MMIX instruction into the current location; (4) Increase the current location by 4.

- **16.** Now let's consider the pseudo-operations, starting with the simplest cases.
- ⟨label⟩ IS ⟨expression⟩ defines the value of the label to be the value of the expression, which must not be a future reference. The expression may be either pure or a register number.
- (label) LOC (expression) first defines the label to be the value of the current location, if the label is nonempty. Then the current location is changed to the value of the expression, which must be pure.

For example, 'LOC #1000' will start assembling subsequent instructions or data in location whose hexadecimal value is  $^{\#}1000$ . 'X LOC @+500' defines X to be the address of the first of 500 bytes in memory; assembly will continue at location X + 500. The operation of aligning the current location to a multiple of 256, if it is not already aligned in that way, can be expressed as 'LOC @+(256-@)&255'.

A less trivial example arises if we want to emit instructions and data into two separate areas of memory, but we want to intermix them in the MMIXAL source file. We could start by defining 8H and 9H to be the starting addresses of the instruction and data segments, respectively. Then, a sequence of instructions could be enclosed in 'LOC 8B; ...; 8H IS @'; a sequence of data could be enclosed in 'LOC 9B; ...; 9H IS @'. Any number of such sequences could then be combined. Instead of the two pseudo-instructions '8H IS @; LOC 9B' one could in fact write simply '8H LOC 9B' when switching from instructions to data.

- PREFIX (symbol) redefines the current prefix to be the given symbol (fully qualified). The label field should be blank.
- 17. The next pseudo-operations assemble bytes, wydes, tetrabytes, or octabytes of data.
- ⟨label⟩ BYTE ⟨expression list⟩ defines the label to be the current location, if the label field is nonempty; then it assembles one byte for each expression in the expression list, and advances the current location by the number of bytes. The expressions should all be pure numbers that fit in one byte.

String constants are often used in such expression lists. For example, if the current location is #1000, the instruction BYTE "Hello", 0 assembles six bytes containing the constants 'H', 'e', 'l', 'l', 'o', and 0 into locations #1000, ..., #1005, and advances the current location to #1006.

- ⟨label⟩ WYDE ⟨expression list⟩ is similar, but it first makes the current location even, by adding 1 to it if necessary. Then it defines the label (if a nonempty label is present), and assembles each expression as a two-byte value. The current location is advanced by twice the number of expressions in the list. The expressions should all be pure numbers that fit in two bytes.
- $\langle label \rangle$  TETRA  $\langle expression list \rangle$  is similar, but it aligns the current location to a multiple of 4 before defining the label; then it assembles each expression as a four-byte value. The current location is advanced by 4n if there are n expressions in the list. Each expression should be a pure number that fits in four bytes.
- $\langle label \rangle$  OCTA  $\langle expression \ list \rangle$  is similar, but it first aligns the current location to a multiple of 8; it assembles each expression as an eight-byte value. The current

location is advanced by 8n if there are n expressions in the list. Any or all of the expressions may be future references, but they should all be defined as pure numbers eventually.

- 18. Global registers are important for accessing memory in MMIX programs. They could be allocated by hand, and defined with IS instructions, but MMIXAL provides a mechanism that is usually much more convenient:
- $\bullet$  (label) GREG (expression) allocates a new global register, and assigns its number as the equivalent of the label. At the beginning of assembly, the current global threshold G is \$255. Each distinct GREG instruction decreases G by 1; the final value of G will be the initial value of rG when the assembled program is loaded.

The value of the expression will be loaded into the global register at the beginning of the program. If this value is nonzero, it should remain constant throughout the program execution; such global registers are considered to be base addresses. Two or more base addresses with the same constant value are assigned to the same global register number.

Base addresses can simplify memory accesses in an important way. Suppose, for example, five octabyte values appear in a data segment, and their addresses are called AA, BB, CC, DD, and EE:

AA LOC @+8;BB LOC @+8;CC LOC @+8;DD LOC @+8;EE LOC @+8

Then if you say Base GREG AA, you will be able to write simply 'LDO \$1,AA' to bring AA into register \$1, and 'LDO \$2,CC' to bring CC into register \$2.

Here's how it works: Whenever a memory operation such as LDO or STB or GO has only two operands, the second operand should be a pure number whose value can be expressed as  $b+\delta$ , where  $0 \le \delta < 256$  and b is the value of a base address in one of the preceding GREG commands. The MMIXAL processor will find the closest base address and manufacture an appropriate command. For example, the instruction 'LDO \$2,CC' in the example of the preceding paragraph would be converted automatically to 'LDO \$2,Base,16'.

If no base address is close enough, an error message will be generated, unless this program is run with the -x option on the command line. The -x option inserts additional instructions if necessary, using global register 255, so that any address is accessible. For example, if there is no base address that allows LDO \$2,FF to be implemented in a single instruction, but if FF equals Base+1000, then the -x option would assemble two instructions.

SETL \$255,1000; LDO \$2,Base,\$255

in place of LDO \$2,FF. Caution: The -x feature makes the number of actual MMIX instructions hard to predict, so extreme care must be used if your style of coding includes relative branch instructions in dangerous forms like 'BNZ x,@+8'.

This base address convention can be used also with the alias operation LDA. For example, 'LDA \$3,CC' loads the address of CC into register 3, by assembling the instruction 'ADDU \$3,Base,16'.

MMIXAL also allows a two-operand form for memory operations such as

## LDO \$1,\$2

to be an abbreviation for 'LDO \$1,\$2,0'.

When MMIXAL programs use subroutines with a memory stack in addition to the built-in register stack, they usually begin with the instructions 'sp GREG 0; fp GREG 0'; these instructions allocate a *stack pointer* sp=\$254 and a *frame pointer* fp=\$253. However, subroutine libraries are free to implement any conventions for global registers and stacks that they like.

- 19. Short programs rarely run out of global registers, but long programs need a mechanism to check that GREG hasn't been used too often. The following pseudo-instruction provides the necessary safety valve:
- LOCAL ⟨expression⟩ ensures that the expression will be a local register in the program being assembled. The expression should be a register number, and the label field should be blank. At the close of assembly, MMIXAL will report an error if the final value of G does not exceed all register numbers that are declared local in this way.

A LOCAL instruction need not be given unless the register number is 32 or more. (MMIX always considers \$0 through \$31 to be local, so MMIXAL implicitly acts as if the instruction 'LOCAL \$31' were present.)

- **20.** Finally, there are two pseudo-instructions to pass information and hints to the loading routine and/or to debuggers that will be using the assembled program.
- BSPEC  $\langle \text{expression} \rangle$  begins "special mode"; the  $\langle \text{expression} \rangle$  should have a value that fits in two bytes, and the label field should be blank.
- ESPEC ends "special mode"; the operand field is ignored, and the label field should be blank.

All material assembled between BSPEC and ESPEC is passed directly to the output, but not loaded as part of the assembled program. Ordinary MMIX instructions cannot appear in special mode; only the pseudo-operations IS, PREFIX, BYTE, WYDE, TETRA, OCTA, GREG, and LOCAL are allowed. The operand of BSPEC should have a value that fits in two bytes; this value identifies the kind of data that follows. (For example, BSPEC 0 might introduce information about subroutine calling conventions at the current location, and BSPEC 1 might introduce line numbers from a high-level-language program that was compiled into the code at the current place. System routines often need to pass such information through an assembler to the operating system, hence MMIXAL provides a general-purpose conduit.)

21. A program should begin at the special symbolic location Main (more precisely, at the address corresponding to the fully qualified symbol:Main). This symbol always has serial number 1, and it must always be defined.

Locations should not receive assembled data more than once. (More precisely, the loader will load the bitwise xor of all the data assembled for each byte position; but the general rule "do not load two things into the same byte" is safest.) All locations that do not receive assembled data are initially zero, except that the loading routine will put register stack data into segment 3, and the operating system may put command line data and debugger data into segment 2. (The rudimentary MMIX operating system starts a program with the number of command line arguments in \$0, and a pointer to the beginning of an array of argument pointers in \$1.) Segments 2 and 3 should not get assembled data, unless the user is a true hacker who is willing to take the risk that such data might crash the system.

22. Binary MMO output. When the MMIXAL processor assembles a file called foo.mms, it produces a binary output file called foo.mmo. (The suffix mms stands for "MMIX symbolic," and mmo stands for "MMIX object.") Such mmo files have a simple structure consisting of a sequence of tetrabytes. Some of the tetrabytes are instructions to a loading routine; others are data to be loaded.

Loader instructions are distinguished from tetrabytes of data by their first (most significant) byte, which has the special escape-code value #98, called mm in the program below. This code value corresponds to MMIX's opcode LDVTS, which is unlikely to occur in tetras of data. The second byte X of a loader instruction is the loader opcode, called the *lopcode*. The third and fourth bytes, Y and Z, are operands. Sometimes they are combined into a single 16-bit operand called YZ.

#define mm #98

23. A small, contrived example will help explain the basic ideas of mmo format. Consider the following input file, called test.mms:

```
% A peculiar example of MMIXAL
     LOC
           Data_Segment
                             % location #2000000000000000
     OCTA
           1F
                              % a future reference
     GR.F.G
                              % $254 is base address for ABCD
ABCD BYTE
          "ab"
                             % two bytes of data
     LOC
           #123456789
                             % switch to the instruction segment
Main JMP
           1F
                             % another future reference
     LOC
           @+#4000
                             % skip past 16384 bytes
2H
     LDB
           $3,ABCD+1
                             % use the base address
     B7.
           $3,1F; TRAP
                             % and refer to the future again
# 3 "foo.mms"
                             % this comment is a line directive
     T.OC
           2B-4*10
                             % move 10 tetras before previous location
1H
     JMP
                             % resolve previous references to 1F
           2B
     BSPEC 5
                             % begin special data of type 5
     TETRA &a<<8
                             % four bytes of special data
     WYDE a-$0
                             % two more bytes of special data
     ESPEC
                             % end a special data packet
                             % resume the data segment
     LOC
           ABCD+2
     BYTE "cd",#98
                             % assemble three more bytes of data
```

It defines a silly program that essentially puts 'b' into register 3; the program halts when it gets to an all-zero TRAP instruction following the BZ. But the assembled output of this file illustrates most of the features of MMIX objects, and in fact test.mms was the first test file tried by the author when the MMIXAL processor was originally written.

The binary output file test.mmo assembled from test.mms consists of the following tetrabytes, shown in hexadecimal notation with brief comments. Fuller explanations appear with the descriptions of individual lopcodes below.

```
98090101
           lop_pre 1,1 (preamble, version 1, 1 tetra)
36f4a363
           (the file creation time)
98012001
           lop_loc #20,1 (data segment, 1 tetra)
00000000
           (low tetrabyte of address in data segment)
00000000
           (high tetrabyte of OCTA 1F)
           (low tetrabyte, will be fixed up later)
00000000
61620000
           ("ab", padded with trailing zeros)
           lop_loc 0, 2 (instruction segment, 2 tetras)
98010002
00000001
           (high tetrabyte of address in instruction segment)
2345678c
           (low tetrabyte of address, after alignment)
98060002
           lop_file 0, 2 (file name 0, 2 tetras)
           ("test")
74657374
2e6d6d73
           (".mms")
           lop_line 7 (line 7 of the current file)
98070007
           (JMP 1F, will be fixed up later)
f0000000
98024000
           lop_skip #4000 (advance 16384 bytes)
```

```
98070009
           lop_line 9 (line 9 of the current file)
           (LDB $3,a,1, uses base address a)
8103fe01
42030000 (BZ $3.1F. will be fixed later)
9807000a
           lop_line 10 (stay on line 10)
00000000
           (TRAP)
98010002
           lop_loc 0, 2 (instruction segment, 2 tetras)
0000001
           (high tetrabyte of address in instruction segment)
2345a768
           (low tetrabyte of address 1H)
           lop_fixrx 16 (fix 16-bit relative address)
98050010
0100fff5 (fixup for location @-4*-11)
           lop_fixr #ff7 (fix @-4*#ff7)
98040ff7
           lop_fixo #20,1 (data segment, 1 tetra)
98032001
00000000
           (low tetrabyte of data segment address to fix)
98060102
           lop_file 1, 2 (file name 1, 2 tetras)
666f6f2e
           ("foo.")
           ("mms",0)
6d6d7300
98070004
           lop_line 4 (line 4 of the current file)
f000000a
           (JMP 2B)
98080005
           lop_spec 5 (begin special data of type 5)
           (TETRA &a << 8)
00000200
           (WYDE a-$0)
00fe0000
           lop_loc #20,1 (data segment, 1 tetra)
98012001
0000000a
           (low tetrabyte of address in data segment)
00006364
           ("cd" with leading zeros, because of alignment)
98000001
           lop_quote (don't treat next tetrabyte as a lopcode)
           (BYTE #98, padded with trailing zeros)
98000000
           lop_post $254 (begin postamble, G is 254)
980a00fe
           (high tetrabyte of the initial contents of $254)
20000000
           (low tetrabyte of base address $254)
80000008
00000001
           (high tetrabyte of the initial contents of $255)
2345678c
           (low tetrabyte of $255, is address of Main)
980b0000
           lop_stab (begin symbol table)
           (compressed form for symbol table as a ternary trie)
203a5040
50404020
41204220
43094408
83404020
           (ABCD = *200000000000008, serial 3)
4d206120
69056e01
2345678c
           (Main = {}^{\#}000000012345678c, serial 1)
81400f61
           (a = $254, serial 2)
fe820000
980c000a
          lop_end (end symbol table, 10 tetras)
```

**24.** When a tetrabyte of the mmo file does not begin with the escape code, it is loaded into the current location  $\lambda$ , and  $\lambda$  is increased to the next higher multiple of 4. (If  $\lambda$  is not a multiple of 4, the tetrabyte actually goes into location  $\lambda \wedge (-4) = 4\lfloor \lambda/4 \rfloor$ , according to MMIX's usual conventions.) The current line number is also increased by 1, if it is nonzero.

When a tetrabyte does begin with the escape code, its next byte is the lopcode defining a loader instruction. There are thirteen lopcodes:

- $lop\_quote$ : X = #00, YZ = 1. Treat the next tetra as an ordinary tetrabyte, even if it begins with the escape code.
- $lop\_loc$ : X = #01, Y = high byte, Z = tetra count (Z = 1 or 2). Set the current location to the 64-bit address defined by the next Z tetras, plus  $2^{56}Y$ . Usually Y = 0 (for the instruction segment) or Y = #20 (for the data segment). If Z = 2, the high tetra appears first.
- $lop\_skip$ : X = #02, YZ = delta. Increase the current location by YZ.
- $lop\_fixo$ : X = #03, Y = high byte, Z = tetra count (Z = 1 or 2). Load the value of the current location  $\lambda$  into octabyte P, where P is the 64-bit address defined by the next Z tetras plus  $2^{56}$ Y as in  $lop\_loc$ . (The octabyte at P was previously assembled as zero because of a future reference.)
- $lop\_fixr$ : X = #04, YZ = delta. Load YZ into the YZ field of the tetrabyte in location P, where P is  $\lambda-4$ YZ, namely the address that precedes the current location by YZ tetrabytes. (This tetrabyte was previously loaded with an MMIX instruction that takes a relative address: a branch, probable branch, JMP, PUSHJ, or GETA. Its YZ field was previously assembled as zero because of a future reference.)
- $lop\_fixrx$ :  $X = {\#05}$ , Y = 0, Z = 16 or 24. Proceed as in  $lop\_fixr$ , but load  $\delta$  into tetrabyte  $P = \lambda 4\delta$  instead of loading YZ into  $P = \lambda 4YZ$ . Here  $\delta$  is the value of the tetrabyte following the  $lop\_fixrx$  instruction; its leading byte will be either 0 or 1. If the leading byte is 1,  $\delta$  should be treated as the negative number  $(\delta \wedge {\#ffffff}) 2^Z$  when calculating the address P. (The latter case arises only rarely, but it is needed when fixing up a relative "future" reference that ultimately leads to a "backward" instruction. The value of  $\delta$  that is xored into location P in such cases will change BZ to BZB, or JMP to JMPB, etc.; we have Z = 24 when fixing a JMP, Z = 16 otherwise.)
- $lop\_file$ : X = #06, Y = file number, Z = tetra count. Set the current file number to Y and the current line number to zero. If this file number has occurred previously, Z should be zero; otherwise Z should be positive, and the next Z tetrabytes are the characters of the file name in big-endian order. Trailing zeros follow the file name if its length is not a multiple of 4.
- lop\_line: X = #07, YZ = line number. Set the current line number to YZ. If the line number is nonzero, the current file and current line should correspond to the source location that generated the next data to be loaded, for use in diagnostic messages. (The MMIXAL processor gives precise line numbers to the sources of tetrabytes in segment 0, which tend to be instructions, but not to the sources of tetrabytes assembled in other segments.)
- lop\_spec: X = #08, YZ = type. Begin special data of type YZ. The subsequent

tetrabytes, continuing until the next loader operation other than  $lop\_quote$ , comprise the special data. A  $lop\_quote$  instruction allows tetrabytes of special data to begin with the escape code.

- $lop\_pre$ : X = #09, Y = 1, Z = tetra count. A  $lop\_pre$  instruction, which defines the "preamble," must be the first tetrabyte of every mmo file. The Y field specifies the version number of mmo format, currently 1; other version numbers may be defined later, but version 1 should always be supported as described in the present document. The Z tetrabytes following a  $lop\_pre$  command provide additional information that might be of interest to system routines. If Z > 0, the first tetra of additional information records the time that this mmo file was created, measured in seconds since 00:00:00 Greenwich Mean Time on 1 Jan 1970.
- $lop\_post$ : X = #0a, Y = 0, Z = G (must be 32 or more). This instruction begins the postamble, which follows all instructions and data to be loaded. It causes the loaded program to begin with rG equal to the stated value of G, and with \$G, G+1, ..., \$255 initially set to the values of the next (256-G)\*2 tetrabytes. These tetrabytes specify 256-G octabytes in big-endian fashion (high half first).
- $lop\_stab$ : X = #0b, YZ = 0. This instruction must appear immediately after the (256-G)\*2 tetrabytes following  $lop\_post$ . It is followed by the symbol table, which lists the equivalents of all user-defined symbols in a compact form that will be described later.
- lop\_end: X = #0c, YZ = tetra count. This instruction must be the very last tetrabyte of each mmo file. Furthermore, exactly YZ tetrabytes must appear between it and the lop\_stab command. (Therefore a program can easily find the symbol table without reading forward through the entire mmo file.)

A separate routine called MMOtype is available to translate binary mmo files into human-readable form.

```
#define lop_quote
                           /* the quotation lopcode */
#define lop_loc
                         /* the location lopcode */
#define lop_skip
                          /* the skip lopcode */
#define lop_fixo
                          /* the octabyte-fix lopcode */
#define lop_fixr
                          /* the relative-fix lopcode */
                           /* extended relative-fix lopcode */
#define lop_fixrx
                         /* the file name lopcode */
#define lop_file
                          /* the file position lopcode */
#define lop_line
                   #8
#define lop_spec
                          /* the special hook lopcode */
                  #9
                         /* the preamble lopcode */
#define lop_pre
                          /* the postamble lopcode */
#define lop_post
#define lop_stab
                          /* the symbol table lopcode */
                          /* the end-it-all lopcode */
#define lop_end
```

25. Many readers will have noticed that MMIXAL has no facilities for relocatable output, nor does mmo format support such features. The author's first drafts of MMIXAL and mmo did allow relocatable objects, with external linkages, but the rules were substantially more complicated and therefore inconsistent with the goals of The Art of Computer Programming. The present design might actually prove to be superior to the current practice, now that computer memory is significantly cheaper than it used to be, because one-pass assembly and loading are extremely fast when relocatability and external linkages are disallowed. Different program modules can be assembled together about as fast as they could be linked together under a relocatable scheme, and they can communicate with each other in much more flexible ways. Debugging tools are enhanced when open-source libraries are combined with user programs, and such libraries will certainly improve in quality when their source form is accessible to a larger community of users.

**26.** Basic data types. This program for the 64-bit MMIX architecture is based on 32-bit integer arithmetic, because nearly every computer available to the author at the time of writing was limited in that way. Details of the basic arithmetic appear in a separate program module called MMIX-ARITH, because the same routines are needed also for the simulators. The definition of type **tetra** should be changed, if necessary, to conform with the definitions found in MMIX-ARITH.

```
\langle \text{Type definitions 26} \rangle \equiv
  typedef unsigned int tetra:
                                         /* assumes that an int is exactly 32 bits wide */
  typedef struct {
     tetra h, l;
  } octa:
               /* two tetrabytes make one octabyte */
  typedef enum {
     false, true
  } bool;
See also sections 30, 54, 58, 62, 68, and 82.
This code is used in section 136.
      \langle \text{Global variables } 27 \rangle \equiv
  extern octa zero_octa;
                                /* zero\_octa.h = zero\_octa.l = 0 */
  extern octa neg_one;
                                /* neg\_one.h = neg\_one.l = -1 */
  extern octa aux; /* auxiliary output of a subroutine */
  extern bool overflow;
                               /* set by certain subroutines for signed arithmetic */
See also sections 33, 36, 37, 43, 46, 51, 56, 60, 63, 67, 69, 77, 83, 90, 105, 120, 133, 139, and 143.
This code is used in section 136.
```

**28.** Most of the subroutines in MMIX-ARITH return an octabyte as a function of two octabytes; for example, oplus(y, z) returns the sum of octabytes y and z. Division inputs the high half of a dividend in the global variable aux and returns the remainder in aux.

```
\langle Subroutines 28\rangle \equiv
  extern octa oplus ARGS((octa y, octa z));
                                                      /* unsigned y + z */
  extern octa ominus ARGS((octa y, octa z));
                                                         /* unsigned y-z*/
  extern octa incr ARGS((octa y, int delta));
                                                        /* unsigned y + \delta (\delta is signed) */
  extern octa oand ARGS((octa y, octa z));
                                                      /* y \wedge z */
  extern octa shift\_left \ ARGS((octa \ y, int \ s));
                                                        /* y \ll s, 0 < s < 64 */
                                                               /* y \gg s, signed if \neg u */
  extern octa shift\_right \ ARGS((octa\ y, int\ s, int\ u));
  extern octa omult \ ARGS((octa \ y, octa \ z));
                                                       /* unsigned (aux, x) = y \times z */
  extern octa odiv \ ARGS((octa \ x, octa \ y, octa \ z));
     /* unsigned (x,y)/z; aux = (x,y) \mod z */
See also sections 41, 42, 44, 45, 47, 48, 49, 50, 52, 55, 57, 59, 73, and 74.
```

This code is used in section 136.

```
ARGS = macro(), \S 31.
                                                              shift_left: octa (),
                               ominus: octa (),
aux: octa, MMIX-ARITH §4.
                               MMIX-ARITH §5.
                                                                MMIX-ARITH §7.
incr: octa (), MMIX-ARITH §6.
                               omult: octa (),
                                                              shift_right: octa (),
neq_one: octa, MMIX-ARITH §4.
                                MMIX-ARITH §8.
                                                               MMIX-ARITH §7.
oand: octa (),
                               oplus: octa (), MMIX-ARITH §5. zero_octa: octa,
 MMIX-ARITH §25.
                               overflow: bool,
                                                                MMIX-ARITH §4.
odiv: octa (), MMIX-ARITH §13.
                               MMIX-ARITH §4.
```

29. Here's a rudimentary check to see if arithmetic is in trouble.

```
\begin{split} \langle & \mbox{ Initialize everything 29} \rangle \equiv \\ & acc = shift\_left(neg\_one, 1); \\ & \mbox{ if } (acc.h \neq {}^\# \mbox{fffffff}) \ panic("\mbox{Type}_{\sqcup} \mbox{tetra}_{\sqcup} \mbox{is}_{\sqcup} \mbox{not}_{\sqcup} \mbox{implemented}_{\sqcup} \mbox{correctly"}); \\ & \mbox{See also sections 32, 61, 71, 84, 91, and 140.} \end{split}
```

This code is used in section 136.

**30.** Future versions of this program will work with symbols formed from Unicode characters, but the present code limits itself to an 8-bit subset. The type **Char** is defined here in order to ease the later transition: At present, **Char** is the same as **char**, but **Char** can be changed to a 16-bit type in the Unicode version.

Other changes will also be necessary when the transition to Unicode is made; for example, some calls of *fprintf* will become calls of *fwprintf*, and some occurrences of %s will become %ls in print formats. The switchable type name Char provides at least a first step towards a brighter future with Unicode.

```
⟨Type definitions 26⟩ +≡

typedef char Char; /* bytes that will become wydes some day */
```

**31.** While we're talking about classic systems versus future systems, we might as well define the ARGS macro, which makes function prototypes available on ANSI C systems without making them uncompilable on older systems. Each subroutine below is declared first with a prototype, then with an old-style definition.

```
 \langle \operatorname{Preprocessor \ definitions \ } 31 \rangle \equiv \\ \# \mathbf{ifdef} \_\_\operatorname{STDC}\_\_ \\ \# \mathbf{define} \ \operatorname{ARGS}(\mathit{list}) \quad \mathit{list} \\ \# \mathbf{else} \\ \# \mathbf{define} \ \operatorname{ARGS}(\mathit{list}) \quad () \\ \# \mathbf{endif} \\ \operatorname{See \ also \ section \ } 39.
```

This code is used in section 136.

**32.** Basic input and output. Input goes into a buffer that is normally limited to 72 characters. This limit can be raised, by using the -b option when invoking the assembler; but short buffers will keep listings from becoming unwieldy, because a symbolic listing adds 19 characters per line.

```
\langle Initialize everything 29\rangle + \equiv
  if (buf\_size < 72) buf\_size = 72;
  buffer = (\mathbf{Char} *) \ calloc(buf\_size + 1, \mathbf{sizeof}(\mathbf{Char}));
  lab\_field = (Char *) calloc(buf\_size + 1, sizeof(Char));
  op\_field = (Char *) calloc(buf\_size, sizeof(Char));
  operand\_list = (Char *) calloc(buf\_size, sizeof(Char));
  err\_buf = (Char *) calloc(buf\_size + 60, sizeof(Char));
  if (\neg buffer \lor \neg lab\_field \lor \neg op\_field \lor \neg operand\_list \lor \neg err\_buf)
     panic("No_room_for_the_buffers");
33. \langle Global variables 27 \rangle + \equiv
  Char *buffer;
                        /* raw input of the current line */
  Char *buf_ptr:
                        /* current position within buffer */
  Char *lab_field;
                         /* copy of the label field of the current instruction */
                         /* copy of the opcode field of the current instruction */
  Char *op\_field;
                              /* copy of the operand field of the current instruction */
  Char *operand_list;
  Char *err_buf;
                         /* place where dynamic error messages are sprinted */
       \langle Get the next line of input text, or break if the input has ended 34\rangle \equiv
  if (\neg fgets(buffer, buf\_size + 1, src\_file)) break;
  line\_no++;
  line\_listed = false;
  j = strlen(buffer);
  if (buffer[j-1] \equiv '\n') buffer[j-1] = '\0'; /* remove the newline */
  else if ((j = fgetc(src\_file)) \neq EOF) (Flush the excess part of an overlong line 35);
  if (buffer[0] \equiv '\#') (Check for a line directive 38);
  buf_ptr = buffer:
This code is used in section 136.
```

```
__STDC__, Standard C.
                                  fqets: char *(), <stdio.h>.
                                                                     line_no: int, §36.
acc: octa, §83.
                                  fprintf: int (), <stdio.h>.
                                                                     neq_one: octa, MMIX-ARITH §4.
buf_size: int, §139.
                                  fwprintf: int (),
                                                                     panic = macro(), \S 45.
calloc: void *(), <stdlib.h>.
                                   multibyte string function.
                                                                     shift_left: octa (),
EOF = (-1), <stdio.h>.
                                  h: tetra, §26.
                                                                      MMIX-ARITH §7.
                                                                     src\_file: FILE *, §139.
false = 0, \S 26.
                                  j: register int, §136.
fgetc: int (), <stdio.h>.
                                  line_listed: bool, §36.
                                                                     strlen: size\_t (), < string.h>.
```

```
35.
      \langle Flush the excess part of an overlong line 35\rangle \equiv
     while (j \neq ' \n', \land j \neq EOF) j = fqetc(src\_file);
     if (\neg long\_warning\_given) {
       long\_warning\_given = true;
       err("*trailing_characters_of_long_input_line_have_been_dropped");
       fprintf (stderr,
             "(say_'-b_<number>'_to_increase_the_length_of_my_input_buffer)\n");
     } else err("*trailing_characters_dropped");
This code is used in section 34.
36. \langle Global variables 27\rangle +\equiv
                   /* index of the current file in filename */
  int cur_file;
                   /* current position in the file */
  int line_no;
  bool line_listed;
                      /* have we listed the buffer contents? */
  bool long_warning_given;
                                 /* have we given the hint about -b? */
```

**37.** We keep track of source file name and line number at all times, for error reporting and for synchronization data in the object file. Up to 256 different source file names can be remembered.

```
⟨Global variables 27⟩ +≡

Char *filename [257]; /* source file names, including those in line directives */

int filename_count; /* how many filename entries have we filled? */
```

**38.** If the current line is a line directive, it will also be treated as a comment by the assembler.

```
\langle Check for a line directive 38\rangle \equiv
    for (p = buffer + 1; isspace(*p); p++):
    for (j = 0; isdigit(*p); p++) \ j = 10 * j + *p - '0';
     for (; isspace(*p); p++);
    if (*p \equiv ' )"'
       if (¬filename[filename_count]) {
          filename[filename\_count] = (Char *) calloc(FILENAME\_MAX + 1, sizeof(Char));
         if (¬filename_count])
            panic("Capacity_exceeded: _Out_of_filename_memory");
       for (p++, k=0; *p \land *p \neq ')", \land k < FILENAME\_MAX; p++, k++)
          filename[filename\_count][k] = *p;
       if (k \equiv FILENAME\_MAX) panic("Capacity\_exceeded:\_File\_name\_too\_long");
       if (*p \equiv '\"' \land *(p-1) \neq '\"') {
                                             /* yes, it's a line directive */
         filename[filename\_count][k] = '\0';
          for (k = 0; strcmp(filename[k], filename[filename\_count]) \neq 0; k++);
          if (k \equiv filename\_count) {
            if (filename\_count \equiv 256)
               panic("Capacity, exceeded: | More, than, 256, file, names");
            filename\_count ++;
```

```
cur\_file = k; \\ line\_no = j - 1; \\ \} \\ \}
```

This code is used in section 34.

**39.** Archaic versions of the C library do not define FILENAME\_MAX.

```
⟨Preprocessor definitions 31⟩ +≡
#ifndef FILENAME_MAX
#define FILENAME_MAX 256
#endif
```

**40.**  $\langle$  Local variables 40  $\rangle$   $\equiv$  register Char \*p, \*q; /\* the place where we're currently scanning \*/ See also section 65.

This code is used in section 136.

41. The next several subroutines are useful for preparing a listing of the assembled results. In such a listing, which the user can request with a command line option, we fill the leftmost 19 columns with a representation of the output that has been assembled from the input in the buffer. Sometimes the assembled output requires more than one line, because we have room to output only a tetrabyte per line.

The *flush\_listing\_line* subroutine is called when we have finished generating one line's worth of assembled material. Its parameter is a string to be printed between the assembled material and the buffer contents, if the input line hasn't yet been echoed. The length of this string should be 19 minus the number of characters already printed on the current line of the listing.

```
 \langle \, \text{Subroutines 28} \, \rangle \, + \equiv \\  \text{void } \textit{flush\_listing\_line } \, \text{ARGS}((\text{char *})); \\  \text{void } \textit{flush\_listing\_line}(s) \\  \quad \text{char *} s; \\ \{ \\  \quad \text{if } (\textit{line\_listed}) \, \textit{fprintf}(\textit{listing\_file}, \text{"}\n"); \\  \quad \text{else } \{ \\  \quad \textit{fprintf}(\textit{listing\_file}, \text{"}\s"\s", s, buffer); \\  \quad \textit{line\_listed} = \textit{true}; \\ \} \\ \}
```

```
\begin{array}{l} \mathtt{ARGS} = \mathtt{macro} \; (\;), \; \S 31. \\ \mathtt{bool} = \mathtt{enum}, \; \S 26. \\ \mathtt{buffer} \colon \; \mathtt{Char} \; *, \; \S 33. \\ \mathtt{calloc} \colon \; \mathtt{void} \; *(\;), \; \mathtt{stdlib.h} \mathord{>}. \\ \mathtt{Char} = \mathtt{char}, \; \S 30. \\ \mathtt{EOF} = (-1), \; \mathtt{stdio.h} \mathord{>}. \\ \mathtt{err} = \mathtt{macro} \; (\;), \; \S 45. \end{array}
```

k: register int, §136. listing\_file: FILE \*, §139. panic = macro (), §45. src\_file: FILE \*, §139. stderr: FILE \*, <stdio.h>. stcmp: int (), <string.h>. true = 1, §26. **42.** Only the three least significant hex digits of a location are shown on the listing, unless the other digits have changed. The following subroutine prints an extra line when a change needs to be shown.

```
\langle Subroutines 28\rangle + \equiv
  void update_listing_loc ARGS((int));
  void update\_listing\_loc(k)
       int k;
                   /* the location to display, mod 4 */
     if (cur\_loc.h \neq listing\_loc.h \vee ((cur\_loc.l \oplus listing\_loc.l) \& #fffff000)) {
       fprintf(listing\_file, "\%08x\%08x:", cur\_loc.h, (cur\_loc.l \& -4) | k);
       flush\_listing\_line("_{|||}");
     listing\_loc.h = cur\_loc.h; \ listing\_loc.l = (cur\_loc.l \& -4) \mid k;
  }
     \langle \text{Global variables } 27 \rangle + \equiv
  octa cur_loc:
                      /* current location of assembled output */
  octa listing_loc;
                         /* current location on the listing */
  unsigned char hold_buf [4];
                                      /* assembled bytes */
                                    /* which bytes of hold_buf are active? */
  unsigned char held_bits;
  unsigned char listing_bits;
                                      /* which of them haven't been listed yet? */
  bool spec_mode;
                         /* are we between BSPEC and ESPEC? */
  tetra spec_mode_loc;
                              /* number of bytes in the current special output */
```

**44.** When bytes are assembled, they are placed into the *hold\_buf*. More precisely, a byte assembled for a location that is j plus a multiple of 4 is placed into  $hold_buf[j]$ ; two auxiliary variables,  $held_bits$  and  $listing_bits$ , are then increased by  $1 \ll j$ . Furthermore,  $listing_bits$  is increased by  $^{\#}10 \ll j$  if that byte is a future reference to be resolved later.

The bytes are held until we need to output them. The *listing\_clear* routine lists any that have been held but not yet shown. It should be called only when *listing\_bits*  $\neq 0$ .

```
listing\_bits = 0;
```

 $BSPEC = #104, \S62.$ 

ESPEC =  $^{\#}$ 105,  $\S62$ .

flush\_listing\_line: void (), §41. fprintf: int (), <stdio.h>.

h: tetra, §26. *l*: **tetra**, §26.

 $listing\_file: FILE *, §139.$  $\mathbf{octa} = \mathbf{struct}, \S 26.$ 

 $\mathbf{tetra} = \mathbf{unsigned} \ \mathbf{int}, \ \S 26.$ 

**45.** Error messages are written to *stderr*. If the message begins with '\*' it is merely a warning; if it begins with '!' it is fatal; otherwise the error is probably serious enough to make manual correction necessary, yet it is not tragic. Errors and warnings appear also on the optional listing file.

```
#define err(m)
          { report_error(m); if (m[0] \neq *, *) goto bypass; }
#define derr(m, p)
          \{ sprintf(err\_buf, m, p); \}
            report_error(err_buf); if (err_buf[0] \neq **) goto bypass; }
#define dderr(m, p, q)
          { sprintf(err\_buf, m, p, q);
            report_error(err_buf); if (err_buf[0] \neq "*") goto bypass; }
#define panic(m)
          { sprintf(err\_buf,"!\%s",m); report\_error(err\_buf); }
#define dpanic(m, p)
          \{ err\_buf[0] = '!'; sprintf(err\_buf + 1, m, p); report\_error(err\_buf); \}
\langle Subroutines 28\rangle + \equiv
  void report_error ARGS((char *));
  void report_error(message)
       char * message;
    if (¬filename[cur_file]) filename[cur_file] = "(nofile)";
    if (message[0] \equiv "") fprintf (stderr, "", "line_", d_warning:_", s\n", ...]
            filename[cur\_file], line\_no, message + 1);
    else if (message[0] \equiv "!") fprintf(stderr, "\"\s\", line_\%d_fatal_error: \\%s\n", \\
            filename[cur\_file], line\_no, message + 1);
    else {
       fprintf(stderr, "\"%s\", _line_\%d: _\%s!\n", filename[cur_file], line_no, message);
       err\_count ++;
    if (listing_file) {
       if (message[0] \equiv ",")
          fprintf(listing\_file, "***********uwarning:_u%s\n", message + 1);
       else if (message[0] \equiv '!')
          fprintf(listing\_file, "******* \bot fatal\_error: \bot \%s! \n", message + 1);
       else fprintf (listing_file, "********uerror: ∟%s!\n", message);
    if (message [0] \equiv '!') exit(-2);
46.
      \langle Global variables 27\rangle + \equiv
                     /* this many errors were found */
  int err_count;
```

**47.** Output to the binary *obj\_file* occurs four bytes at a time. The bytes are assembled in small buffers, not output as single tetrabytes, because we want the output to be big-endian even when the assembler is running on a little-endian machine.

```
#define mmo\_write(buf)
if (fwrite(buf, 1, 4, obj\_file) \neq 4) dpanic("Can't_\write_\uon_\u00db', s", obj_file\_name)
```

```
\langle Subroutines 28\rangle + \equiv
  void mmo_clear ARGS((void));
  void mmo_out ARGS((void));
  unsigned char lop\_quote\_command[4] = \{mm, lop\_quote, 0, 1\};
  void mmo_clear()
                          /* clears hold_buf, when held_bits \neq 0 */
  {
     if (hold\_buf[0] \equiv mm) mmo\_write(lop\_quote\_command);
     mmo\_write(hold\_buf);
     if (listing\_file \land listing\_bits) listing\_clear();
     held_{\bullet}bits = 0;
     hold\_buf[0] = hold\_buf[1] = hold\_buf[2] = hold\_buf[3] = 0;
     mmo\_cur\_loc = incr(mmo\_cur\_loc, 4); mmo\_cur\_loc.l \&= -4;
     if (mmo_line_no) mmo_line_no++;
  unsigned char mmo_buf[4];
  int mmo_ptr;
  void mmo_out()
                         /* output the contents of mmo_buf */
    if (held_bits) mmo_clear();
     mmo\_write(mmo\_buf);
```

```
ARGS = macro (), §31.
bypass: label, §102.
cur_file: int, §36.
err_buf: Char *, §33.
exit: void (), <stdlib.h>.
filename: Char *[], §37.
flush_listing_line: void (), §41.
fprintf: int (), <stdio.h>.
fwrite: size_t (), <stdio.h>.
held_bits: unsigned char, §43.
```

```
hold_buf: unsigned char [], §43.
incr: octa (), MMIX-ARITH §6.
l: tetra, §26.
line_listed: bool, §36.
line_no: int, §36.
listing_bits: unsigned char, §43.
listing_clear: void (), §44.
```

listing\_file: FILE \*, §139. lop\_quote = #0, §24. mm = #98, §22. mmo\_cur\_loc: octa, §51. mmo\_line\_no: int, §51. obj\_file: FILE \*, §139. obj\_file\_name: char [], §139. sprintf: int (), <stdio.h>. stderr: FILE \*, <stdio.h>.

```
48. \langle Subroutines 28\rangle + \equiv
  void mmo_tetra ARGS((tetra));
  void mmo_byte ARGS((unsigned char));
  void mmo_lop ARGS((char, unsigned char, unsigned char));
  void mmo_lopp ARGS((char, unsigned short));
  void mmo_tetra(t)
                         /* output a tetrabyte */
       tetra t:
     mmo\_buf[0] = t \gg 24; mmo\_buf[1] = (t \gg 16) \& #ff;
    mmo\_buf[2] = (t \gg 8) \& #ff; mmo\_buf[3] = t \& #ff;
     mmo\_out();
  }
  void mmo_byte(b)
       unsigned char b;
     mmo\_buf[(mmo\_ptr++) \& 3] = b;
    if (\neg(mmo\_ptr \& 3)) mmo\_out();
  void mmo\_lop(x, y, z)
                             /* output a loader operation */
       char x;
       unsigned char y, z:
     mmo\_buf[0] = mm; \ mmo\_buf[1] = x; \ mmo\_buf[2] = y; \ mmo\_buf[3] = z;
    mmo\_out();
  void mmo\_lopp(x, yz)
                            /* output a loader operation with two-byte operand */
       char x;
       unsigned short yz;
     mmo\_buf[0] = mm; mmo\_buf[1] = x; mmo\_buf[2] = yz \gg 8; mmo\_buf[3] = yz \& #ff;
    mmo\_out();
49.
      The mmo_loc subroutine makes the current location in the object file equal to
cur_loc.
\langle Subroutines 28\rangle + \equiv
  void mmo_loc ARGS((void));
  void mmo_loc()
  {
    octa o:
    if (held_bits) mmo_clear();
    o = ominus(cur\_loc, mmo\_cur\_loc);
    if (o.h \equiv 0 \land o.l < \text{#10000}) {
       if (o.l) mmo\_lopp(lop\_skip, o.l);
     } else {
       if (cur_loc.h & #ffffff) {
         mmo\_lop(lop\_loc, 0, 2);
         mmo\_tetra(cur\_loc.h);
```

```
} else mmo\_lop(lop\_loc, cur\_loc.h \gg 24, 1); mmo\_tetra(cur\_loc.l); } mmo\_cur\_loc = cur\_loc; }
```

**50.** Similarly, the *mmo\_sync* subroutine makes sure that the current file and line number in the output file agree with *cur\_file* and *line\_no*.

```
\langle Subroutines 28\rangle + \equiv
  void mmo_sync ARGS((void));
  void mmo_sync()
     register int i:
     register unsigned char *p;
     if (cur\_file \neq mmo\_cur\_file) {
       if (filename_passed[cur_file]) mmo_lop(lop_file, cur_file, 0);
       else {
          mmo\_lop(lop\_file, cur\_file, (strlen(filename[cur\_file]) + 3) \gg 2);
          for (j = 0, p = filename[cur\_file]; *p; p++, j = (j + 1) & 3) {
             mmo\_buf[j] = *p;
             if (i \equiv 3) \ mmo\_out():
          if (j) {
             for (; j < 4; j++) \ mmo\_buf[j] = 0;
             mmo\_out();
          filename\_passed[cur\_file] = 1;
       mmo\_cur\_file = cur\_file;
       mmo\_line\_no = 0;
     if (line\_no \neq mmo\_line\_no) {
       if (line\_no \ge \#10000)
          panic("I_lcan't_deal_with_line_numbers_exceeding_65535");
        mmo\_lopp(lop\_line, line\_no);
       mmo\_line\_no = line\_no;
     }
  }
```

```
ARGS = macro(), §31.
                                   lop\_line = #7, §24.
                                                                       mmo\_line\_no: int, §51.
cur_file: int, §36.
                                   lop\_loc = #1, \S 24.
                                                                       mmo\_out: void (), §47.
                                   lop\_skip = #2, \S 24.
cur_loc: octa, §43.
                                                                       mmo\_ptr: int, \S 47.
filename: Char *[], §37.
                                   mm = \#98, \S22.
                                                                       octa = struct, \S 26.
filename_passed: char [], §51.
                                   mmo_buf: unsigned char [],
                                                                       ominus: octa (),
h: tetra, §26.
                                                                         MMIX-ARITH §5.
held_bits: unsigned char, §43.
                                   mmo_clear: void (), §47.
                                                                       panic = macro(), \S 45.
l: tetra, §26.
                                   mmo_cur_file: int, §51.
                                                                       strlen: size_t (), <string.h>.
line_no: int, §36.
                                   mmo_cur_loc: octa, §51.
                                                                       tetra = unsigned int, §26.
lop_{\bullet}file = #6, \S 24.
```

```
51. ⟨Global variables 27⟩ +≡
octa mmo_cur_loc; /* current location in the object file */
int mmo_line_no; /* current line number in the mmo output so far */
int mmo_cur_file; /* index of the current file in the mmo output so far */
char filename_passed [256]; /* has a filename been recorded in the output? */
```

**52.** Here is a basic subroutine that assembles k bytes starting at  $cur\_loc$ . The value of k should be 1, 2, or 4, and  $cur\_loc$  should be a multiple of k. The  $x\_bits$  parameter tells which bytes, if any, are part of a future reference.

```
\langle Subroutines 28\rangle + \equiv
  void assemble ARGS((char, tetra, unsigned char));
  void assemble (k, dat, x_bits)
        char k:
        tetra dat;
        unsigned char x_bits;
     register int j, jj, l;
     if (spec\_mode) l = spec\_mode\_loc;
     else {
        l = cur\_loc.l;
        \langle Make sure cur_loc and mmo_cur_loc refer to the same tetrabyte 53\rangle;
        if (\neg held\_bits \land \neg (cur\_loc.h \& \#e0000000)) \ mmo\_sync();
     for (j = 0; j < k; j++) {
        jj = (l+j) \& 3;
        hold\_buf[jj] = (dat \gg (8 * (k-1-j))) \& #ff;
        held\_bits \mid = 1 \ll jj;
        listing\_bits \mid = 1 \ll jj;
     listing\_bits \mid = x\_bits;
     if (((l+k) \& 3) \equiv 0) {
        if (listing_file) listing_clear();
        mmo_clear();
     if (spec\_mode) spec\_mode\_loc += k;
     else cur\_loc = incr(cur\_loc, k);
```

**53.**  $\langle$  Make sure  $cur\_loc$  and  $mmo\_cur\_loc$  refer to the same tetrabyte 53  $\rangle \equiv$  if  $(cur\_loc.h \neq mmo\_cur\_loc.h \vee ((cur\_loc.l \oplus mmo\_cur\_loc.l) \& #fffffffc)) mmo\_loc(); This code is used in section 52.$ 

**54.** The symbol table. Symbols are stored and retrieved by means of a ternary search trie, following ideas of Bentley and Sedgewick. (See ACM-SIAM Symp. on Discrete Algorithms 8 (1997), 360–369; R. Sedgewick, Algorithms in C (Reading, Mass.: Addison-Wesley, 1998), §15.4.) Each trie node stores a character, and there are branches to subtries for the cases where a given character is less than, equal to, or greater than the character in the trie. There also is a pointer to a symbol table entry if a symbol ends at the current node.

```
\langle Type definitions 26\rangle + \equiv
  typedef struct ternary_trie_struct {
                               /* the (possibly wyde) character stored here */
    unsigned short ch:
    struct ternary_trie_struct *left, *mid, *right;
       /* downward in the ternary trie */
    struct sym_tab_struct *sym;
                                          /* equivalents of symbols */
  } trie_node:
    We allocate trie nodes in chunks of 1000 at a time.
\langle Subroutines 28\rangle + \equiv
  trie_node *new_trie_node ARGS((void));
  trie_node *new_trie_node()
  {
    register trie_node *t = next\_trie\_node;
    if (t \equiv last\_trie\_node) {
       t = (trie\_node *) calloc(1000, sizeof(trie\_node));
       if (\neg t) panic("Capacity_exceeded:_\Dut_\of_trie_memory");
       last\_trie\_node = t + 1000;
     }
    next\_trie\_node = t + 1;
    return t:
56. \langle Global variables 27\rangle + \equiv
  trie_node *trie_root;
                            /* root of the trie */
  trie_node *op_root;
                            /* root of subtrie for opcodes */
  trie_node *next_trie_node, *last_trie_node;
                                                    /* allocation control */
  trie_node *cur_prefix;
                              /* root of subtrie for unqualified symbols */
```

```
ARGS = macro (), §31.
calloc: void *(), <stdlib.h>.
cur_loc: octa, §43.
h: tetra, §26.
held_bits: unsigned char, §43.
hold_buf: unsigned char [],
§43.
incr: octa (), MMIX-ARITH §6.
```

l: tetra, §26. listing\_bits: unsigned char, §43. listing\_clear: void (), §44. listing\_file: FILE \*, §139. mmo\_clear: void (), §47. mmo\_loc: void (), §49.  $\begin{array}{l} mmo\_sync: \ \mathbf{void} \ (\ ), \ \S 50. \\ \mathbf{octa} = \mathbf{struct}, \ \S 26. \\ panic = \mathrm{macro} \ (\ ), \ \S 45. \\ spec\_mode: \ \mathbf{bool}, \ \S 43. \\ spec\_mode\_loc: \ \mathbf{tetra}, \ \S 43. \\ \mathbf{sym\_tab\_struct}: \ \mathbf{struct}, \ \S 58. \\ \mathbf{tetra} = \mathbf{unsigned} \ \mathbf{int}, \ \S 26. \\ \end{array}$ 

**57.** The *trie\_search* subroutine starts at a given node of the trie and finds a given string in its middle subtrie, inserting new nodes if necessary. The string ends with the first nonletter or nondigit; the location of the terminating character is stored in global variable *terminator*.

```
#define isletter(c) (isalpha(c) \lor c \equiv '\_' \lor c \equiv ': ' \lor (unsigned int)(c) > 126)
\langle Subroutines 28\rangle + \equiv
   trie_node *trie_search ARGS((trie_node *, Char *));
   Char *terminator:
                                  /* where the search ended */
   trie\_node *trie\_search(t, s)
         trie\_node *t:
         Char *s:
      register trie_node *tt = t;
      register unsigned char *p = (unsigned char *) s;
      while (1) {
         if (\neg isletter(*p) \land \neg isdigit(*p)) {
             terminator = (\mathbf{Char} *) p; \mathbf{return} \ tt;
         if (tt→mid) {
            tt = tt \rightarrow mid;
            while (*p \neq tt \rightarrow ch) {
               if (*p < tt \rightarrow ch) {
                   if (tt \rightarrow left) tt = tt \rightarrow left;
                   else {
                      tt \rightarrow left = new\_trie\_node(); tt = tt \rightarrow left; goto store\_new\_char;
                } else {
                   if (tt \rightarrow right) tt = tt \rightarrow right;
                      tt \rightarrow right = new\_trie\_node(); tt = tt \rightarrow right; goto store\_new\_char;
                }
            }
            p++;
         } else {
            tt \rightarrow mid = new\_trie\_node(); tt = tt \rightarrow mid;
         store\_new\_char: tt \rightarrow ch = *p++;
      }
   }
```

**58.** Symbol table nodes hold the serial numbers and equivalents of defined symbols. They also hold "fixup information" for undefined symbols; this will allow the loader to correct any previously assembled instructions that refer to such symbols when they are eventually defined.

In the symbol table node for a defined symbol, the *link* field has one of the special codes DEFINED or REGISTER or PREDEFINED, and the *equiv* field holds the defined value. The *serial* number is a unique identifier for all user-defined symbols.

In the symbol table node for an undefined symbol, the equiv field is ignored. The link field points to the first node of fixup information; that node is, in turn, a symbol table node that might link to other fixups. The serial number in a fixup node is either 0 or 1 or 2, meaning respectively "fixup the octabyte pointed to by equiv" or "fixup the relative address in the YZ field of the instruction pointed to by equiv" or "fixup the relative address in the XYZ field of the instruction pointed to by equiv."

```
#define DEFINED (sym_node *) 1
                                         /* code value for octabyte equivalents */
#define REGISTER (sym_node *) 2
                                        /* code value for register-number equivalents */
#define PREDEFINED (sym_node *) 3
                                            /* code value for not-vet-used equivalents */
#define fix_o 0
                      /* serial code for octabyte fixup */
#define fix_yz 1
                      /* serial code for relative fixup */
#define fix_xyz 2
                        /* serial code for JMP fixup */
\langle \text{ Type definitions 26} \rangle + \equiv
  typedef struct sym_tab_struct {
                   /* serial number of symbol; type number for fixups */
                                       /* DEFINED status or link to fixup */
    struct sym_tab_struct *link;
                    /* the equivalent value */
    octa equiv;
  } sym_node;
```

isdigit: int (), <ctype.h>.

new\_trie\_node: trie\_node \*(),

 $left: \mathbf{trie\_node} *, \S 54.$ 

 $mid: \mathbf{trie\_node} *, \S 54.$ 

§55.

 $octa = struct, \S 26.$ 

right:  $trie\_node *, §54$ .

 $trie\_node = struct, \S 54.$ 

**59.** The allocation of new symbol table nodes proceeds in chunks, like the allocation of trie nodes. But in this case we also have the possibility of reusing old fixup nodes that are no longer needed.

```
#define recycle\_fixup(pp) pp \neg link = sym\_avail, sym\_avail = pp
\langle Subroutines 28\rangle + \equiv
  sym_node *new_sym_node ARGS((bool));
  sym_node *new_sym_node(serialize)
        bool serialize:
                            /* should the new node receive a unique serial number? */
     register sym_node *p = sym_avail;
        sym\_avail = p \neg link; p \neg link = \Lambda; p \neg serial = 0; p \neg equiv = zero\_octa;
     } else {
       p = next\_sym\_node;
        if (p \equiv last\_sym\_node) {
          p = (\mathbf{sym\_node} *) \ calloc(1000, \mathbf{sizeof}(\mathbf{sym\_node}));
          if (\neg p) panic("Capacity = exceeded: Dut of symbol memory");
          last\_sym\_node = p + 1000;
        next\_sym\_node = p + 1;
     if (serialize) p \rightarrow serial = ++ serial\_number;
     return p;
  }
60. \langle Global variables 27 \rangle +\equiv
  int serial_number;
  svm_node *sum_root:
                               /* root of the sym */
                                                        /* allocation control */
  sym_node *next_sym_node, *last_sym_node;
  sym_node *sym_avail;
                                 /* stack of recycled symbol table nodes */
```

**61.** We initialize the trie by inserting all the predefined symbols. Opcodes are given the prefix ^, to distinguish them from ordinary symbols; this character nicely divides uppercase letters from lowercase letters.

```
⟨ Initialize everything 29 ⟩ +≡
    trie_root = new_trie_node();
    cur_prefix = trie_root;
    op_root = new_trie_node();
    trie_root→mid = op_root;
    trie_root→ch = ':';
    op_root→ch = '^';
    ⟨ Put the MMIX opcodes and MMIXAL pseudo-ops into the trie 64⟩;
    ⟨ Put other predefined symbols into the trie 70⟩;
```

**62.** Most of the assembly work can be table driven, based on bits that are stored as the "equivalents" of opcode symbols like ^ADD.

```
#define rel_addr_bit #1
                             /* is YZ or XYZ relative? */
#define immed_bit #2
                            /* should opcode be immediate if Z or YZ not register? */
#define zar_bit #4
                         /* should register status of Z be ignored? */
#define zr_bit #8
                        /* must Z be a register? */
#define uar_bit #10
                          /* should register status of Y be ignored? */
#define yr_bit #20
                         /* must Y be a register? */
#define xar_bit #40
                          /* should register status of X be ignored? */
#define xr_bit #80
                         /* must X be a register? */
#define yzar_bit #100
                            /* should register status of YZ be ignored? */
#define yzr_bit #200
                           /* must YZ be a register? */
#define xyzar_bit #400
                             /* should register status of XYZ be ignored? */
#define xyzr_bit #800
                            /* must XYZ be a register? */
#define one_ara_bit #1000
                                /* is it OK to have zero or one operand? */
#define two_arg_bit #2000
                                /* is it OK to have exactly two operands? */
#define three_arg_bit #4000
                                 /* is it OK to have exactly three operands? */
                                 /* is it OK to have more than three operands? */
#define many_arq_bit #8000
#define align_bits #30000
                                /* how much alignment: byte, wyde, tetra, or octa? */
                                 /* should the label be blank? */
#define no_label_bit #40000
#define mem_bit #80000
                               /* must YZ be a memory reference? */
#define spec_bit #100000
                               /* is this opcode allowed in SPEC mode? */
\langle \text{Type definitions 26} \rangle + \equiv
  typedef struct {
                      /* symbolic opcode */
    Char *name;
                    /* numeric opcode */
    short code:
                /* treatment of operands */
    int bits:
  } op_spec;
  typedef enum {
    SET = #100, IS, LOC, PREFIX, BSPEC, ESPEC, GREG, LOCAL,
    BYTE, WYDE, TETRA, OCTA
  } pseudo_op:
```

```
ARGS = macro(), \S 31.
                                   equiv: octa, §58.
                                                                      panic = macro(), \S 45.
bool = enum, \S 26.
                                   link: sym_node *, §58.
                                                                      serial: int, §58.
calloc: void *(), <stdlib.h>.
                                   mid: \mathbf{trie\_node} *, \S 54.
                                                                      sym_node = struct, \S 58.
ch: unsigned short, §54.
                                   new_trie_node: trie_node *(),
                                                                      trie\_root: trie_node *, §56.
                                                                      zero_octa: octa,
Char = char, \S 30.
                                     §55.
cur\_prefix: trie_node *, §56.
                                   op\_root: trie\_node *, §56.
                                                                        MMIX-ARITH §4.
```

```
63. \langle Global variables 27 \rangle + \equiv
  op\_spec \ op\_init\_table[] = \{
  {"TRAP", #00, #27554}, {"FCMP", #01, #240a8},
  {"FUN", #02, #240a8}, {"FEQL", #03, #240a8},
  {"FADD", #04, #240a8}, {"FIX", #05, #26288},
  {"FSUB", #06, #240a8}, {"FIXU", #07, #26288},
  {"FLOT", #08, #26282}, {"FLOTU", #0a, #26282},
  {"SFLOT", #0c, #26282}, {"SFLOTU", #0e, #26282},
  {"FMUL", #10, #240a8}, {"FCMPE", #11, #240a8},
  {"FUNE", #12, #240a8}, {"FEQLE", #13, #240a8},
  {"FDIV", #14, #240a8}, {"FSQRT", #15, #26288},
  {"FREM", #16, #240a8}, {"FINT", #17, #26288},
  {"MUL", #18, #240a2}, {"MULU", #1a, #240a2},
  {"DIV", #1c, #240a2}, {"DIVU", #1e, #240a2},
  {"ADD", #20, #240a2}, {"ADDU", #22, #240a2},
  {"SUB", #24, #240a2}, {"SUBU", #26, #240a2},
  {"2ADDU", #28, #240a2}, {"4ADDU", #2a, #240a2},
  {"8ADDU", #2c, #240a2}, {"16ADDU", #2e, #240a2},
  {"CMP", #30, #240a2}, {"CMPU", #32, #240a2},
  {"NEG", #34, #26082}, {"NEGU", #36, #26082},
  {"SL", #38, #240a2}, {"SLU", #3a, #240a2},
  {"SR", #3c, #240a2}, {"SRU", #3e, #240a2},
  {"BN", #40, #22081}, {"BZ", #42, #22081},
  {"BP", #44, #22081}, {"BOD", #46, #22081},
  {"BNN", #48, #22081}, {"BNZ", #4a, #22081},
  {"BNP", #4c, #22081}, {"BEV", #4e, #22081}.
  {"PBN", #50, #22081}, {"PBZ", #52, #22081}.
  {"PBP", #54, #22081}, {"PBOD", #56, #22081},
  {"PBNN", #58, #22081}, {"PBNZ", #5a, #22081},
  {"PBNP", #5c, #22081}, {"PBEV", #5e, #22081},
  {"CSN", #60, #240a2}, {"CSZ", #62, #240a2},
  {"CSP", #64, #240a2}, {"CSOD", #66, #240a2}.
  {"CSNN", #68, #240a2}, {"CSNZ", #6a, #240a2},
  {"CSNP", #6c, #240a2}, {"CSEV", #6e, #240a2},
  {"ZSN", #70, #240a2}, {"ZSZ", #72, #240a2},
  {"ZSP", #74, #240a2}, {"ZSOD", #76, #240a2},
  {"ZSNN", #78, #240a2}, {"ZSNZ", #7a, #240a2},
  {"ZSNP", #7c, #240a2}, {"ZSEV", #7e, #240a2},
  {"LDB", #80, #a60a2}, {"LDBU", #82, #a60a2},
  {"LDW", #84, #a60a2}, {"LDWU", #86, #a60a2},
  {"LDT", #88, #a60a2}, {"LDTU", #8a, #a60a2},
  {"LDO", #8c, #a60a2}, {"LDOU", #8e, #a60a2},
  {"LDSF", #90, #a60a2}, {"LDHT", #92, #a60a2},
  {"CSWAP", #94, #a60a2}, {"LDUNC", #96, #a60a2},
  {"LDVTS", #98, #a60a2}, {"PRELD", #9a, #a6022},
  {"PREGO", #9c, #a6022}, {"GO", #9e, #a60a2},
  {"STB", #a0, #a60a2}, {"STBU", #a2, #a60a2},
  {"STW", #a4, #a60a2}, {"STWU", #a6, #a60a2},
  {"STT", #a8, #a60a2}, {"STTU", #aa, #a60a2},
```

```
{"STO", #ac, #a60a2}, {"STOU", #ae, #a60a2},
{"STSF", #b0, #a60a2}, {"STHT", #b2, #a60a2},
{"STCO", *b4, *a6022}, {"STUNC", *b6, *a60a2},
{"SYNCD", #b8, #a6022}, {"PREST", #ba, #a6022},
{"SYNCID", #bc, #a6022}, {"PUSHGO", #be, #a6062},
{"OR", #c0, #240a2}, {"ORN", #c2, #240a2},
{"NOR", #c4, #240a2}, {"XOR", #c6, #240a2},
{"AND", #c8, #240a2}, {"ANDN", #ca, #240a2},
{"NAND", #cc, #240a2}, {"NXOR", #ce, #240a2},
{"BDIF", #d0, #240a2}, {"WDIF", #d2, #240a2},
{"TDIF", #d4, #240a2}, {"ODIF", #d6, #240a2},
{"MUX", #d8, #240a2}, {"SADD", #da, #240a2},
{"MOR", *dc, *240a2}, {"MXOR", *de, *240a2},
{"SETH", #e0, #22080}, {"SETMH", #e1, #22080},
{"SETML", #e2, #22080}, {"SETL", #e3, #22080},
{"INCH", #e4, #22080}, {"INCMH", #e5, #22080},
{"INCML", #e6, #22080}, {"INCL", #e7, #22080},
{"ORH", #e8, #22080}, {"ORMH", #e9, #22080},
{"ORML", #ea, #22080}, {"ORL", #eb, #22080},
{"ANDNH", #ec, #22080}, {"ANDNMH", #ed, #22080},
{"ANDNML", #ee, #22080}, {"ANDNL", #ef, #22080},
{"JMP", #f0, #21001}, {"PUSHJ", #f2, #22041},
{"GETA", #f4, #22081}, {"PUT", #f6, #22002},
{"POP", #f8, #23000}, {"RESUME", #f9, #21000},
{"SAVE", #fa, #22080}, {"UNSAVE", #fb, #23a00},
{"SYNC", #fc, #21000}, {"SWYM", #fd, #27554},
{"GET", #fe, #22080}, {"TRIP", #ff, #27554},
{"SET", SET, #22180}, {"LDA", #22, #a60a2},
{"IS", IS, #101400}, {"LOC", LOC, #1400},
{"PREFIX", PREFIX, #141000},
{"BYTE", BYTE, #10f000}, {"WYDE", WYDE, #11f000},
{"TETRA", TETRA, #12f000}, {"OCTA", OCTA, #13f000},
{"BSPEC", BSPEC, #41400}, {"ESPEC", ESPEC, #141000},
{"GREG", GREG, #101000}, {"LOCAL", LOCAL, #141800}};
int op_init_size; /* the number of items in op_init_table */
```

```
64. \(\right\) Put the MMIX opcodes and MMIXAL pseudo-ops into the trie 64\) \equiv
           op_init_size = (sizeof op_init_table)/sizeof(op_spec);
          for (j = 0; j < op\_init\_size; j++) {
                     tt = trie\_search(op\_root, op\_init\_table[j].name);
                    pp = tt \rightarrow sym = new\_sym\_node(false);
                    pp \rightarrow link = PREDEFINED;
                    pp \rightarrow equiv.h = op\_init\_table[j].code, pp \rightarrow equiv.l = op\_init\_table[j].bits;
          }
This code is used in section 61.
65. \langle \text{Local variables } 40 \rangle + \equiv
          register trie_node *tt:
          register sym_node *pp, *qq;
66. \langle Put the special register names into the trie 66\rangle
          for (j = 0; j < 32; j++) {
                     tt = trie\_search(trie\_root, special\_name[j]);
                    pp = tt \rightarrow sym = new\_sym\_node(false);
                    pp \rightarrow link = PREDEFINED;
                    pp \rightarrow equiv.l = j;
This code is used in section 61.
67. \langle Global variables 27 \rangle + \equiv
          Char *special_name[32] = {"rB", "rD", "rE", "rH", "rJ", "rM", "rR", "rBB", "rC", "rN",
                               "rO", "rS", "rI", "rT", "rTT", "rK", "rQ", "rU", "rV", "rG", "rL", "rA", "rF", "rP",
                               "rW", "rX", "rY", "rZ", "rWW", "rXX", "rYY", "rZZ"};
68. \langle \text{Type definitions } 26 \rangle + \equiv
          typedef struct {
                     Char *name:
                    tetra h, l;
           } predef_spec;
69. \langle Global variables 27 \rangle + \equiv
          \mathbf{predef\_spec} \ \mathit{predefs}[] = \{ \{"ROUND\_CURRENT", 0, 0\}, \{"ROUND\_OFF", 0, 1\}, \{"ROUND\_UP", 0, 1\}, \{"RO
                               2}, {"ROUND_DOWN", 0, 3}, {"ROUND_NEAR", 0, 4},
           {"Inf", #7ff00000, 0},
           {"Data_Segment", #20000000, 0}, {"Pool_Segment", #40000000, 0}, {"Stack_Segment",
                               <sup>#</sup>60000000, 0},
           {"D_BIT", 0, #80}, {"V_BIT", 0, #40}, {"W_BIT", 0, #20}, {"I_BIT", 0, #10}, {"O_BIT", 0, "O_BIT", 0,
                               *08}, {"U_BIT", 0, *04}, {"Z_BIT", 0, *02}, {"X_BIT", 0, *01},
           {"D_Handler", 0, #10}, {"V_Handler", 0, #20}, {"W_Handler", 0, #30}, {"I_Handler", 0,
                               *40}, {"O_Handler", 0, *50}, {"U_Handler", 0, *60}, {"Z_Handler", 0, *70},
                               {"X_Handler", 0, #80},
           {"StdIn", 0, 0}, {"StdOut", 0, 1}, {"StdErr", 0, 2},
           {\text{"TextRead"}, 0, 0}, {\text{"TextWrite"}, 0, 1}, {\text{"BinaryRead"}, 0, 2}, {\text{"BinaryWrite"}, 0, 3},
                               {"BinaryReadWrite", 0, 4},
           \{"Halt", 0, 0\}, \{"Fopen", 0, 1\}, \{"Fclose", 0, 2\}, \{"Fread", 0, 3\}, \{"Fgets", 0, 4\}, \}
                                \{"Fgetws", 0, 5\}, \{"Fwrite", 0, 6\}, \{"Fputs", 0, 7\}, \{"Fputws", 0, 8\}, \{"Fseek", 0, 9\}, \{"Fputws", 0, 5\}, \{"Fwrite", 0, 6\}, \{"Fputws", 0, 8\}, \{"Fwrite", 0, 6\}, \{"Fwrite", 0
                               {"Ftell", 0, 10};
          int predef_size;
```

```
70. \langle \text{Put other predefined symbols into the trie 70} \rangle \equiv predef\_size = (sizeof predefs)/sizeof(predef\_spec);
for (j=0;\ j < predef\_size;\ j++) {
    tt = trie\_search(trie\_root, predefs[j].name);
    pp = tt^-sym = new\_sym\_node(false);
    pp^-link = \text{PREDEFINED};
    pp^-equiv.h = predefs[j].h, pp^-equiv.l = predefs[j].l;
}
```

This code is used in section 61.

71. We place Main into the trie at the beginning of assembly, so that it will show up as an undefined symbol if the user specifies no starting point.

```
\langle Initialize everything 29\rangle += trie\_search(trie\_root, "Main") \rightarrow sym = new\_sym\_node(true);
```

```
bits: int, §62.

Char = char, §30.

code: short, §62.

equiv: octa, §58.

false = 0, §26.

h: tetra, §26.

j: register int, §136.

l: tetra, §26.

link: sym_node *, §58.
```

```
name: Char *, §62.

new_sym_node: sym_node

*(), §59.

op_init_size: int, §63.

op_init_table: op_spec [], §63.

op_root: trie_node *, §56.

op_spec = struct, §62.

PREDEFINED = macro, §58.
```

 $\begin{array}{l} sym: \ sym\_node \ *, \S 54. \\ sym\_node = struct, \S 58. \\ tetra = unsigned \ int, \S 26. \\ trie\_node = struct, \S 54. \\ trie\_root: \ trie\_node \ *, \S 56. \\ trie\_search: \ trie\_node \ *(), \\ \S 57. \\ true = 1, \S 26. \end{array}$ 

**72.** At the end of assembly we traverse the entire symbol table, visiting each symbol in lexicographic order and transmitting the trie structure to the output file. We detect any undefined future references at this time.

The order of traversal has a simple recursive pattern: To traverse the subtrie rooted at t, we

```
traverse t-left, if the left subtrie is nonempty; visit t-sym, if this symbol table entry is present; traverse t-mid, if the middle subtrie is nonempty; traverse t-right, if the right subtrie is nonempty.
```

This pattern leads to a compact representation in the mmo file, usually requiring fewer than two bytes per trie node plus the bytes needed to encode the equivalents and serial numbers. Each node of the trie is encoded as a "master byte" followed by the encodings of the left subtrie, character, equivalent, middle subtrie, and right subtrie. The master byte is the sum of

```
#80, if the character occupies two bytes instead of one;
#40, if the left subtrie is nonempty;
#20, if the middle subtrie is nonempty;
#10, if the right subtrie is nonempty;
#01 to #08, if the symbol's equivalent is one to eight bytes long;
#09 to #0e, if the symbol's equivalent is 2<sup>61</sup> plus one to six bytes;
#0f, if the symbol's equivalent is $0 plus one byte;
```

the character is omitted if the middle subtrie and the equivalent are both empty. The "equivalent" of an undefined symbol is zero, but stated as two bytes long. Symbol equivalents are followed by the serial number, represented as a sequence of one or more bytes in radix 128; the final byte of the serial number is tagged by adding 128. (Thus, serial number  $2^{14} - 1$  is encoded as #7ffff; serial number  $2^{14}$  is #010080.)

**73.** First we prune the trie by removing all predefined symbols that the user did not redefine.

```
 \langle \text{Subroutines 28} \rangle +\equiv \\ \text{trie\_node} *prune \text{ ARGS}((\text{trie\_node }*)); \\ \text{trie\_node }*prune(t) \\ \text{trie\_node }*t; \\ \{ \\ \text{register int } useful = 0; \\ \text{if } (t \rightarrow sym) \; \{ \\ \text{if } (t \rightarrow sym \rightarrow serial) \; useful = 1; \\ \text{else } t \rightarrow sym = \Lambda; \\ \} \\ \text{if } (t \rightarrow left) \; \{ \\ t \rightarrow left = prune(t \rightarrow left); \\ \text{if } (t \rightarrow left) \; useful = 1; \\ \}
```

```
if (t→mid) {
          t \rightarrow mid = prune(t \rightarrow mid);
          if (t \rightarrow mid) useful = 1:
      if (t→right) {
          t \rightarrow right = prune(t \rightarrow right);
          if (t \rightarrow right) useful = 1;
      if (useful) return t;
      else return \Lambda;
        Then we output the trie by following the recursive traversal pattern.
\langle Subroutines 28\rangle + \equiv
   void out_stab ARGS((trie_node *));
   void out\_stab(t)
          trie\_node *t;
      register int m = 0, j;
      register sym_node *pp;
      if (t \rightarrow ch > \#ff) m += \#80;
      if (t \rightarrow left) m += #40;
      if (t \rightarrow mid) m += #20;
      if (t \rightarrow right) m += #10;
      if (t→sym) {
          if (t \rightarrow sym \rightarrow link \equiv REGISTER) m += f;
          else if (t \rightarrow sym \rightarrow link \equiv DEFINED) (Encode the length of t \rightarrow sym \rightarrow equiv 76)
          else if (t \rightarrow sym \rightarrow link \lor t \rightarrow sym \rightarrow serial \equiv 1) (Report an undefined symbol 79);
      mmo\_byte(m);
      if (t\rightarrow left) out_stab(t\rightarrow left);
      if (m \& #2f) \langle Visit t \text{ and traverse } t \rightarrow mid 75 \rangle;
      if (t \rightarrow right) out_stab(t \rightarrow right);
```

```
\begin{array}{l} \mathtt{ARGS} = \mathtt{macro} \; (\;), \; \S 31. \\ ch: \; \mathbf{unsigned \; short}, \; \S 54. \\ \mathtt{DEFINED} = \mathtt{macro}, \; \S 58. \\ equiv: \; \mathbf{octa}, \; \S 58. \\ \textit{left:} \; \mathbf{trie\_node} \; *, \; \S 54. \\ \end{array}
```

link: sym\_node \*, §58.
mid: trie\_node \*, §54.
mmo\_byte: void (), §48.
REGISTER = macro, §58.
right: trie\_node \*, §54.

serial: int, §58.
sym: sym\_node \*, §54.
sym\_node = struct, §58.
trie\_node = struct, §54.

**75.** A global variable called *sym\_buf* holds all characters on middle branches to the current trie node; *sym\_ptr* is the first currently unused character in *sym\_buf*.

```
\langle \text{ Visit } t \text{ and traverse } t \rightarrow mid 75 \rangle \equiv
      if (m \& #80) mmo\_byte(t \rightarrow ch \gg 8):
      mmo\_byte(t \rightarrow ch \& #ff);
      *sym_ptr ++ = (m \& #80 ? '?' : t \rightarrow ch);
                                                               /* Unicode? not vet */
      m \&= f; if (m \land t \rightarrow sym \rightarrow link) {
          if (listing_file) \( \text{Print symbol } sym_buf \) and its equivalent 78\( \);
          if (m \equiv 15) m = 1;
         else if (m > 8) m -= 8;
          for (; m > 0; m --)
             if (m > 4) mmo\_byte((t \rightarrow sym \rightarrow equiv.h \gg (8 * (m - 5))) \& #ff);
             else mmo\_byte((t \rightarrow sym \rightarrow equiv.l \gg (8 * (m-1))) \& #ff);
          for (m = 0; m < 4; m++)
             if (t \rightarrow sym \rightarrow serial < (1 \ll (7 * (m+1)))) break;
          for (; m \ge 0; m--) mmo\_byte(((t \rightarrow sym \rightarrow serial \gg (7*m)) \& #7f) + (m?0: #80));
      if (t \rightarrow mid) out_stab(t \rightarrow mid);
      sym\_ptr --;
This code is used in section 74.
76. \langle Encode the length of t \rightarrow sym \rightarrow equiv 76\rangle \equiv
   \{ \text{ register tetra } x; \}
      if ((t \rightarrow sym \rightarrow equiv.h \& #ffff0000) \equiv #20000000)
         m += 8, x = t \rightarrow sym \rightarrow equiv.h - \#200000000;
                                                                         /* data segment */
      else x = t \rightarrow sym \rightarrow equiv.h;
      if (x) m += 4; else x = t \rightarrow sym \rightarrow equiv.l;
      for (j = 1; j < 4; j++)
         if (x < (1 \ll (8 * j))) break;
      m += j;
This code is used in section 74.
```

77. We make room for symbols up to 999 bytes long. Strictly speaking, the program should check if this limit is exceeded; but really!

```
\langle Global variables 27\rangle +\equiv Char sym\_buf [1000]; Char *sym\_ptr;
```

78. The initial ':' of each fully qualified symbol is omitted here, since most users of MMIXAL will probably not need the PREFIX feature. One consequence of this omission is that the one-character symbol ':' itself, which is allowed by the rules of MMIXAL, is printed as the null string.

```
\langle \text{ Print symbol } sym\_buf \text{ and its equivalent } 78 \rangle \equiv
      *sum\_ptr = '\0':
     fprintf(listing\_file, "$\_\%s_\=_\", sym\_buf + 1);
      pp = t \rightarrow sum:
      if (pp \neg link \equiv DEFINED) fprintf (listing\_file, "#\%08x\%08x", pp \neg equiv.h, pp \neg equiv.l);
      else if (pp \neg link \equiv \text{REGISTER}) fprintf (listing\_file, "\$\%03d", pp \neg equiv.l);
      else fprintf(listing_file, "?");
     fprintf(listing\_file, "_{\perp}(%d)\n", pp\rightarrow serial);
This code is used in section 75.
79.
       \langle Report an undefined symbol 79\rangle \equiv
      *sym\_ptr = (m \& #80 ? '?' : t \rightarrow ch); /* Unicode? not yet */
      *(sym_ptr + 1) = '\0';
     fprintf(stderr, "undefined_symbol:_\%s\n", sym_buf + 1);
      err\_count ++;
      m += 2:
This code is used in section 74.
80.
        \langle Check and output the trie 80\rangle \equiv
   op\_root \neg mid = \Lambda;
                             /* annihilate all the opcodes */
   prune(trie_root);
   sym_ptr = sym_buf;
   if (listing_file) fprintf(listing_file, "\nSymbol_\table:\n");
   mmo\_lop(lop\_stab, 0, 0);
   out\_stab(trie\_root);
   while (mmo\_ptr \& 3) \ mmo\_byte(0);
   mmo\_lopp(lop\_end, mmo\_ptr \gg 2);
This code is used in section 142.
```

```
ch: unsigned short, §54.
                                   lop\_end = \#c, §24.
                                                                      pp: register sym_node *,
                                   lop\_stab = {}^{\#}b, \S 24.
Char = char, \S 30.
                                                                        §74.
DEFINED = macro, \S 58.
                                   m: register int, §74.
                                                                      prune: trie\_node *(), §73.
equiv: octa, §58.
                                   mid: \mathbf{trie\_node} *, \S 54.
                                                                      REGISTER = macro, \S58.
                                   mmo_byte: void (), §48.
err_count: int, §46.
                                                                      serial: int, §58.
fprintf: int (), <stdio.h>.
                                   mmo_lop: void (), §48.
                                                                      stderr: FILE *, <stdio.h>.
                                   mmo_lopp: void (), §48.
h: tetra, §26.
                                                                      sym: sym_node *, §54.
j: register int, §74.
                                   mmo\_ptr: int, \S 47.
                                                                      t: trie_node *, §74.
l: tetra, §26.
                                   op\_root: trie_node *, §56.
                                                                      tetra = unsigned int, \S 26.
link: sym_node *, §58.
                                   out_stab: void, §74.
                                                                      trie\_root: trie_node *, §56.
listing\_file: FILE *, §139.
```

81. Expressions. The most intricate part of the assembly process is the task of scanning and evaluating expressions in the operand field. Fortunately, MMIXAL's expressions have a simple structure that can be handled easily with a stack-based approach.

Two stacks hold pending data as the operand field is scanned and evaluated. The  $op\_stack$  contains operators that have not yet been performed; the  $val\_stack$  contains values that have not yet been used. After an entire operand list has been scanned, the  $op\_stack$  will be empty and the  $val\_stack$  will hold the operand values needed to assemble the current instruction.

**82.** Entries on *op\_stack* have one of the constant values defined here, and they have one of the precedence levels defined here.

Entries on *val\_stack* have *equiv*, *link*, and *status* fields; the *link* points to a trie node if the expression is a symbol that has not yet been subjected to any operations.

```
\langle \text{Type definitions 26} \rangle + \equiv
  typedef enum {
     negate, serialize, complement, registerize, inner_lp,
         plus, minus, times, over, frac, mod, shl, shr, and, or, xor,
         outer_lp, outer_rp, inner_rp
  } stack_op;
  typedef enum {
     zero, weak, strong, unary
  } prec;
  typedef enum {
    pure, req_val, undefined
  } stat:
  typedef struct {
    octa equiv;
                     /* current value */
                         /* trie reference for symbol */
    trie_node *link;
                     /* pure, reg_val, or undefined */
    stat status;
  } val_node;
      #define top\_op op_stack[op\_ptr-1] /* top entry on the operator stack */
\#define top\_val val\_stack[val\_ptr-1] /* top entry on the value stack */
#define next\_val val\_stack[val\_ptr - 2]
                                             /* next-to-top entry of the value stack */
\langle \text{Global variables } 27 \rangle + \equiv
                            /* stack for pending operators */
  stack\_op * op\_stack;
                  /* number of items on op_stack */
  val_node *val_stack;
                            /* stack for pending operands */
                 /* number of items on val_stack */
  prec\ precedence[] = \{unary, unary, unary, unary, zero, \}
       weak, weak, strong, strong, strong, strong, strong, strong, strong, weak, weak,
                            /* precedences of the respective stack_op values */
       zero, zero, zero \;
  stack_op rt_op;
                      /* newly scanned operator */
  octa acc; /* temporary accumulator */
```

```
84. ⟨Initialize everything 29⟩ +≡
  op_stack = (stack_op *) calloc(buf_size, sizeof(stack_op));
  val_stack = (val_node *) calloc(buf_size, sizeof(val_node));
  if (¬op_stack ∨ ¬val_stack) panic("Nouroomuforutheustacks");
```

**85.** The operand field of an instruction will have been copied into a separate **Char** array called *operand\_list* when we reach this part of the program.

```
 \langle \text{Scan the operand field } 85 \rangle \equiv \\ p = operand\_list; \\ val\_ptr = 0; \quad /* \ val\_stack \ \text{is empty } */ \\ op\_stack[0] = outer\_lp, op\_ptr = 1; \\ /* \ op\_stack \ \text{contains an "outer left parenthesis" } */ \\ \textbf{while (1) } \{ \\ \langle \text{Scan opening tokens until putting something on } val\_stack \ 86} \rangle; \\ scan\_close: \langle \text{Scan a binary operator or closing token, } rt\_op \ 97} \rangle; \\ \textbf{while } (precedence[top\_op] \geq precedence[rt\_op]) \\ \langle \text{Perform the top operation on } op\_stack \ 98} \rangle; \\ hold\_op: op\_stack[op\_ptr++] = rt\_op; \\ \} \\ operands\_done:
```

This code is used in section 102.

**86.** A comment that follows an empty operand list needs to be detected here.

```
\langle Scan opening tokens until putting something on val_stack 86\rangle \equiv
scan\_open: if (isletter(*p)) \langle Scan a symbol 87 \rangle
  else if (isdigit(*p)) {
     if (*(p+1) \equiv 'F') (Scan a forward local 88)
     else if (*(p+1) \equiv 'B') (Scan a backward local 89)
     else (Scan a decimal constant 94);
  } else switch (*p++) {
     case '#': (Scan a hexadecimal constant 95): break:
     case '\'': (Scan a character constant 92); break;
     case '\"': (Scan a string constant 93); break;
     case '@': (Scan the current location 96); break;
     case '-': op\_stack[op\_ptr++] = negate;
     case '+': goto scan_open;
     case '&': op\_stack[op\_ptr++] = serialize; goto scan\_open;
     case '~': op\_stack[op\_ptr++] = complement; goto scan\_open;
     case '$': op\_stack[op\_ptr++] = registerize; goto scan\_open;
     case '(': op\_stack[op\_ptr++] = inner\_lp; goto scan\_open;
     default:
        if (p \equiv operand\_list + 1) {
                                        /* treat operand list as empty */
           operand\_list[0] = `0`, operand\_list[1] = `\0`, p = operand\_list;
          goto scan_open;
       if (*(p-1)) derr("syntax_uerror_at_character_i", *(p-1));
        derr("syntax_{||}error_{||}after_{||}character_{||}'%c''',*(p-2));
This code is used in section 85.
87. \langle Scan \ a \ symbol \ 87 \rangle \equiv
     if (*p \equiv ':') tt = trie\_search(trie\_root, p + 1);
     else tt = trie\_search(cur\_prefix, p);
     p = terminator;
  symbol\_found: val\_ptr ++;
     pp = tt \rightarrow sym;
     if (\neg pp) pp = tt \rightarrow sym = new\_sym\_node(true);
     top\_val.link = tt, top\_val.equiv = pp \rightarrow equiv;
     if (pp \rightarrow link \equiv PREDEFINED) pp \rightarrow link = DEFINED;
     top\_val.status = (pp \neg link \equiv DEFINED ? pure : pp \neg link \equiv REGISTER ? req\_val : undefined);
```

This code is used in section 86.

```
88.
       \langle Scan a forward local 88 \rangle \equiv
     tt = \& forward\_local\_host[*p - `0`]; p += 2; goto symbol\_found;
This code is used in section 86.
       \langle Scan a backward local 89 \rangle \equiv
     tt = \&backward\_local\_host[*p - `0"]; p += 2; goto symbol\_found;
This code is used in section 86.
       Statically allocated variables forward\_local\_host[j] and backward\_local\_host[j]
masquerade as nodes of the trie.
\langle Global variables 27 \rangle + \equiv
  trie_node forward_local_host[10], backward_local_host[10];
  sym_node forward_local[10], backward_local[10];
       Initially OH, 1H, ..., 9H are defined to be zero.
\langle Initialize everything 29\rangle + \equiv
  for (j = 0; j < 10; j++) {
     forward\_local\_host[j].sym = \&forward\_local[j];
     backward\_local\_host[j].sym = \&backward\_local[j];
     backward\_local[j].link = DEFINED;
  }
```

```
complement = 2, §82.
cur_prefix: trie_node *, §56.
DEFINED = macro, §58.
derr = macro (), §45.
equiv: octa, §82.
equiv: octa, §58.
inner_lp = 4, §82.
isdigit: int (), <ctype.h>.
isletter = macro (), §57.
j: register int, §136.
link: trie_node *, §82.
link: sym_node *, §58.
negate = 0, §82.
new_sym_node
```

```
*(), §59.

op_ptr: int, §83.

op_stack: stack_op *, §83.

operand_list: Char *, §33.

p: register Char *, §40.

pp: register sym_node *,
§65.

PREDEFINED = macro, §58.

pure = 0, §82.

reg_val = 1, §82.

REGISTER = macro, §58.

registerize = 3, §82.

serialize = 1, §82.

status: stat, §82.
```

```
sym: sym_node *, §54.

sym_node = struct, §58.

terminator: Char *, §57.

top\_val = macro, §83.

trie\_node = struct, §54.

trie\_root: trie\_node *, §56.

trie\_search: trie\_node *(), §57.

true = 1, §26.

tt: register trie\_node *, §65.

undefined = 2, §82.

val\_ptr: int, §83.

val\_stack: val\_node *, §83.
```

92. We have already checked to make sure that the character constant is legal.  $\langle Scan \ a \ character \ constant \ 92 \rangle \equiv$ acc.h = 0, acc.l =(unsigned char) \*p; p += 2: **goto** constant\_found: This code is used in section 86. **93.**  $\langle \text{Scan a string constant 93} \rangle \equiv$ acc.h = 0, acc.l = (unsigned char) \*p;if  $(*p \equiv ' )"'$ p++;acc.l = 0; err("\*null\_string\_is\_treated\_as\_zero"); } else if  $(*(p+1) \equiv '\"')$  p += 2; **else** \* $p = ' \ ''' , *--p = ', ';$ **goto** constant\_found; This code is used in section 86. **94.**  $\langle$  Scan a decimal constant 94 $\rangle \equiv$ acc.h = 0, acc.l = \*p - '0'; for (p++; isdigit(\*p); p++) {  $acc = oplus(acc, shift\_left(acc, 2));$  $acc = incr(shift\_left(acc, 1), *p - '0');$  $constant\_found: val\_ptr +++;$  $top\_val.link = \Lambda;$  $top\_val.equiv = acc;$  $top\_val.status = pure;$ This code is used in section 86.  $\langle Scan \ a \ hexadecimal \ constant \ 95 \rangle \equiv$ if  $(\neg isxdigit(*p))$   $err("illegal_hexadecimal_constant");$ acc.h = acc.l = 0;for (; isxdigit(\*p); p++) {  $acc = incr(shift\_left(acc, 4), *p - '0');$ if (\*p > `a`) acc = incr(acc, `0` - `a` + 10); else if (\*p > `A`) acc = incr(acc, `O` - `A` + 10);goto constant\_found; This code is used in section 86. 96.  $\langle$  Scan the current location 96 $\rangle \equiv$  $acc = cur\_loc;$ **goto** constant\_found; This code is used in section 86.  $\langle Scan a binary operator or closing token, rt_op 97 \rangle \equiv$ switch (\*p++) { case '+':  $rt\_op = plus$ ; break; case '-':  $rt\_op = minus$ ; break;

```
case '*': rt\_op = times; break;
  case '/': if (*p \neq '/') rt\_op = over;
     else p++, rt\_op = frac; break;
  case '%': rt\_op = mod; break;
  case '<': rt\_op = shl; goto sh\_check;
  case '>': rt\_op = shr;
  sh\_check: p++; if (*(p-1) \equiv *(p-2)) break;
     derr("syntax_lerror_lat_l'%c', *(p-2));
  case '&': rt\_op = and; break;
  case '| ': rt\_op = or; break;
  case ', '; rt\_op = xor; break;
  case ')': rt\_op = inner\_rp; break:
  case '\0': case ',': rt\_op = outer\_rp; break;
  default: derr("syntax_i error_i at_i, %c,", *(p-1));
This code is used in section 85.
      \langle \text{ Perform the top operation on } op\_stack 98 \rangle \equiv
  switch (op\_stack[--op\_ptr]) {
  case inner\_lp: if (rt\_op \equiv inner\_rp) goto scan\_close;
     err("*missing_right_parenthesis"); break;
  case outer\_lp: if (rt\_op \equiv outer\_rp) {
       if (top\_val.status \equiv req\_val \land (top\_val.equiv.l > \#ff \lor top\_val.equiv.h)) {
          err("*register_number_too_large,_will_be_reduced_mod_256");
          top\_val.equiv.h = 0, top\_val.equiv.l \&= #ff;
       if (\neg *(p-1)) goto operands_done;
       else rt\_op = outer\_lp; goto hold\_op;
                                                   /* comma */
     } else {
       op\_ptr ++; err("*missing\_left\_parenthesis");
       goto scan_close;
  (Cases for unary operators 100)
  (Cases for binary operators 99)
This code is used in section 85.
```

```
acc: octa, §83.
                                     l: tetra, §26.
                                                                          pure = 0, \S 82.
and = 13, \S 82.
                                     link: trie_node *, §82.
                                                                          reg_val = 1, \S 82.
cur_loc: octa, §43.
                                     minus = 6, \S 82.
                                                                          rt\_op: stack_op, §83.
                                    mod = 10, \S 82.
                                                                          scan_close: label, §85.
derr = macro(), \S 45.
                                    op\_ptr: int, §83.
equiv: octa, §82.
                                                                          shift_left: octa (),
err = macro(), \S 45.
                                     op\_stack: stack_op *, §83.
                                                                           MMIX-ARITH §7.
frac = 9, \S 82.
                                     operands_done: label, §85.
                                                                          shl = 11, \S 82.
h: tetra, §26.
                                     oplus: octa (), MMIX-ARITH §5. shr = 12, §82.
hold\_op: label, §85.
                                     or = 14, \S 82.
                                                                          status: stat, §82.
incr: octa (), MMIX-ARITH §6.
                                    outer\_lp = 16, \S 82.
                                                                          times = 7, \S 82.
inner\_lp = 4, \S 82.
                                     outer\_rp = 17, \S 82.
                                                                          top\_val = macro, \S 83.
                                                                          val\_ptr: int, §83.
inner\_rp = 18, \S 82.
                                     over = 8, \S 82.
isdigit: int (), <ctype.h>.
                                    p: register Char *, §40.
                                                                         xor = 15, \S 82.
isxdigit: int (), <ctype.h>.
                                    plus = 5, \S 82.
```

**99.** Now we come to the part where equivalents are changed by unary or binary operators found in the expression being scanned.

The most typical operator, and in some ways the fussiest one to deal with, is binary addition. Once we've written the code for this case, the other cases almost take care of themselves.

```
\langle \text{ Cases for binary operators } 99 \rangle \equiv
case plus: if (top\_val.status \equiv undefined) err("cannot\_add_lan_lundefined_lquantity");
  if (next_val.status \equiv undefined) err("cannot_ladd_lto_lan_lundefined_lquantity");
  if (top\_val.status \equiv req\_val \land next\_val.status \equiv req\_val)
     err("cannot, add, two, register, numbers");
  next\_val.equiv = oplus(next\_val.equiv, top\_val.equiv);
fin\_bin: next\_val.status = (top\_val.status \equiv next\_val.status ? pure : reg\_val); val\_ptr --;
delink: top\_val.link = \Lambda; break;
See also section 101.
This code is used in section 98.
100.
         #define unary_check(verb)
           if (top\_val.status \neq pure) derr("can_\]%s_\pure_\values_\only", verb)
\langle \text{ Cases for unary operators } 100 \rangle \equiv
case negate: unary_check("negate");
  top\_val.equiv = ominus(zero\_octa, top\_val.equiv); goto delink;
case complement: unary_check("complement");
  top\_val.equiv.h = \sim top\_val.equiv.h, top\_val.equiv.l = \sim top\_val.equiv.l;
  goto delink;
case registerize: unary_check("registerize");
  top\_val.status = req\_val; goto delink;
case serialize: if (¬top_val.link) err("can_take_serial_number_of_symbol_only");
  top\_val.equiv.h = 0, top\_val.equiv.l = top\_val.link \neg sym \neg serial;
  top\_val.status = pure; goto delink;
This code is used in section 98.
101.
         #define binary\_check(verb) if (top\_val.status \neq pure \lor next\_val.status \neq pure)
                derr("can_{\sqcup}\%s_{\sqcup}pure_{\sqcup}values_{\sqcup}only", verb)
\langle Cases for binary operators 99\rangle + \equiv
case minus: if (top\_val.status \equiv undefined)
     err("cannot<sub>□</sub>subtract<sub>□</sub>an<sub>□</sub>undefined<sub>□</sub>quantity");
  if (next\_val.status \equiv undefined)
     err("cannot_subtract_from_an_undefined_quantity");
  if (top\_val.status \equiv reg\_val \land next\_val.status \neq reg\_val)
     err("cannot_subtract_register_number_from_pure_value");
  next_val.equiv = ominus(next_val.equiv, top_val.equiv); goto fin_bin;
case times: binary_check("multiply");
  next_val.equiv = omult(next_val.equiv, top_val.equiv); goto fin_bin;
case over: case mod: binary_check("divide");
  if (top\_val.equiv.l \equiv 0 \land top\_val.equiv.h \equiv 0) err("*division_lby_lzero");
  next\_val.equiv = odiv(zero\_octa, next\_val.equiv, top\_val.equiv);
  if (op\_stack[op\_ptr] \equiv mod) next\_val.equiv = aux;
  goto fin_bin;
```

```
case frac: binary_check("compute_la_ratio_of");
  if (next\_val.equiv.h \ge top\_val.equiv.h \land (next\_val.equiv.l \ge top\_val.equiv.h)
           top\_val.equiv.l \lor next\_val.equiv.h > top\_val.equiv.h) err("*illegal_ifraction");
  next\_val.equiv = odiv(next\_val.equiv, zero\_octa, top\_val.equiv); goto fin\_bin;
case shl: case shr: binary_check("compute_a_bitwise_shift_of");
  if (top\_val.equiv.h \lor top\_val.equiv.l > 63) next\_val.equiv = zero\_octa;
  else if (op\_stack[op\_ptr] \equiv shl)
     next\_val.equiv = shift\_left(next\_val.equiv, top\_val.equiv.l);
  else next\_val.equiv = shift\_right(next\_val.equiv, top\_val.equiv.l, 1);
  goto fin_bin;
case and: binary_check("compute_ibitwise_land_of");
  next\_val.equiv.h. \&= top\_val.equiv.h, next\_val.equiv.l. \&= top\_val.equiv.l;
  goto fin_bin;
case or: binary_check("compute_bitwise_or_of");
  next\_val.equiv.h | = top\_val.equiv.h, next\_val.equiv.l | = top\_val.equiv.l;
  goto fin_bin;
case xor: binary_check("compute_ibitwise_ixor_iof");
  next\_val.equiv.h \oplus = top\_val.equiv.h, next\_val.equiv.l \oplus = top\_val.equiv.l;
  goto fin_bin;
```

```
and = 13, \S 82.
                                   ominus: octa (),
                                                                       shift_left: octa (),
aux: octa, mmix-arith §4.
                                    MMIX-ARITH §5.
                                                                         MMIX-ARITH §7.
complement = 2, \S 82.
                                   omult: octa (),
                                                                       shift_right: octa (),
derr = macro(), \S 45.
                                    MMIX-ARITH §8.
                                                                         MMIX-ARITH §7.
                                                                       shl = 11, \S 82.
equiv: octa, §82.
                                   op_ptr: int, §83.
err = macro(), \S 45.
                                   op\_stack: stack\_op *, \S 83.
                                                                       shr = 12, \S 82.
frac = 9, \S 82.
                                   oplus: octa (), MMIX-ARITH §5. status: stat, §82.
h: tetra, §26.
                                   or = 14, \S 82.
                                                                       sym: sym_node *, §54.
                                   over = 8, \S 82.
l: tetra, §26.
                                                                       times = 7, \S 82.
link: trie_node *, §82.
                                   plus = 5, \S 82.
                                                                       top\_val = macro, \S 83.
minus = 6, \S 82.
                                   pure = 0, \S 82.
                                                                       undefined = 2, \S 82.
mod = 10, \S 82.
                                   reg\_val = 1, \S 82.
                                                                       val\_ptr: int, §83.
                                                                      xor = 15, \S 82.
negate = 0, \S 82.
                                   registerize = 3, \S 82.
next\_val = macro, \S 83.
                                   serial: int, §58.
                                                                      zero_octa: octa,
odiv: octa (), MMIX-ARITH §13. serialize = 1, §82.
                                                                         MMIX-ARITH §4.
```

102. Assembling an instruction. Now let's move up from the expression level to the instruction level. We get to this part of the program at the beginning of a line, or after a semicolon at the end of an instruction earlier on the current line. Our current position in the buffer is the value of  $buf_-ptr$ .

```
\langle Process the next MMIXAL instruction or comment 102 \rangle \equiv
  p = buf_ptr; buf_ptr = "";
   (Scan the label field: goto bypass if there is none 103);
   (Scan the opcode field; goto bypass if there is none 104);
   \langle \text{Copy the operand field } 106 \rangle:
  buf_ptr = p;
  if (spec\_mode \land \neg (op\_bits \& spec\_bit))
     derr("cannot_{\sqcup}use_{\sqcup}'%s'_{\sqcup}in_{\sqcup}special_{\sqcup}mode", op\_field);
  if ((op\_bits \& no\_label\_bit) \land lab\_field[0]) {
     derr("*label_field_of_'%s'_instruction_is_ignored", op_field);
     lab\_field[0] = '\0';
  if (op_bits & align_bits) \( \text{Align the location pointer 107} \);
  (Scan the operand field 85);
  if (opcode \equiv GREG) \langle Allocate a global register 108 \rangle;
  if (lab\_field[0]) \( Define the label 109 \);
  \langle \text{ Do the operation } 116 \rangle;
bypass:
This code is used in section 136.
103. (Scan the label field; goto bypass if there is none 103) \equiv
  if (\neg *p) goto bypass;
  q = lab\_field;
  if (\neg isspace(*p)) {
     if (\neg isdigit(*p) \land \neg isletter(*p)) goto bypass;
                                                             /* comment */
     for (*q++=*p++; isdigit(*p) \lor isletter(*p); p++, q++) *q = *p;
     if (*p \land \neg isspace(*p)) \ derr("label_\syntax_\error_\at_\", *p);
  }
  *q = '\0';
  if (isdigit(lab\_field[0]) \land (lab\_field[1] \neq 'H' \lor lab\_field[2]))
     derr("improper_local_label_'%s', lab_field);
  for (p++; isspace(*p); p++);
This code is used in section 102.
104. We copy the opcode field to a special buffer because we might want to refer
to the symbolic opcode in error messages.
\langle Scan the opcode field; goto bypass if there is none 104\rangle \equiv
  q = op\_field; while (isletter(*p) \lor isdigit(*p)) *q++ = *p++; *q = '\0';
  if (\neg isspace(*p) \land *p \land op\_field[0]) \ derr("opcode_syntax_error_at_'%c'",*p);
  pp = trie\_search(op\_root, op\_field) \rightarrow sym;
  if (\neg pp) {
     if (op\_field[0]) derr("unknown_loperation_lcode_l'%s'", op\_field);
     if (lab_field[0]) derr("*no_lopcode; | label| '%s', | will | | be_lignored", lab_field);
     goto bypass;
  }
```

```
opcode = pp¬equiv.h, op_bits = pp¬equiv.l;
while (isspace(*p)) p++;
This code is used in section 102.

105. ⟨Global variables 27⟩ +≡
tetra opcode; /* numeric code for MMIX operation or MMIXAL pseudo-op */
tetra op_bits; /* flags describing an operator's special characteristics */
```

106. We copy the operand field to a special buffer so that we can change string constants while scanning them later.

```
\langle \text{Copy the operand field } 106 \rangle \equiv
  q = operand\_list;
  while (*p) {
     if (*p \equiv ';') break;
     if (*p \equiv ' \setminus '')
       *q++ = *p++;
       if (¬*p) err("incomplete, character, constant");
       *q++ = *p++:
       if (*p \neq `\",") err("illegal_character_constant");
     } else if (*p \equiv '\"') {
       for (*q++=*p++; *p \land *p \neq '\"'; p++, q++) *q = *p;
       if (\neg *p) err("incomplete_string_constant");
     *q++=*p++;
    if (isspace(*p)) break;
  }
  while (isspace(*p)) p++;
  if (*p \equiv ';') p++;
  else p = "":
                   /* if not followed by semicolon, rest of the line is a comment */
  if (q \equiv operand\_list) *q++ = '0'; /* change empty operand field to '0' */
  *a = '\0':
```

This code is used in section 102.

```
align\_bits = #30000, \S62.
                                   isspace: int (), <ctype.h>.
                                                                        §65.
buf\_ptr: Char *, §33.
                                   l: tetra, §26.
                                                                      q: register Char *, §40.
derr = macro(), \S 45.
                                   lab\_field: Char *, §33.
                                                                     spec\_bit = #100000, \S62.
equiv: octa, §58.
                                   no\_label\_bit = #40000, \S62.
                                                                     spec_mode: bool, §43.
err = macro(), \S 45.
                                   op_field: Char *, §33.
                                                                     sym: sym_node *, §54.
GREG = #106, \S62.
                                   op\_root: trie\_node *, §56.
                                                                     tetra = unsigned int, §26.
h: tetra, §26.
                                   operand_list: Char *, §33.
                                                                     trie_search: trie_node *(),
isdigit: int (), <ctype.h>.
                                  p: register Char *, §40.
                                                                        §57.
isletter = macro(), \S 57.
                                  pp: register sym_node *,
```

107. It is important to do the alignment in this step before defining the label or evaluating the operand field.

```
 \left\{ \begin{array}{l} \{ \\ j = (op\_bits \ \& \ align\_bits) \gg 16; \\ acc.h = -1, acc.l = -(1 \ll j); \\ cur\_loc = oand(incr(cur\_loc, (1 \ll j) - 1), acc); \\ \} \\ \end{array} \right.  This code is used in section 102.  \begin{array}{l} \textbf{108.} \quad \left\langle \text{Allocate a global register } 108 \right\rangle \equiv \\ \left\{ \begin{array}{l} \textbf{if } (val\_stack [0].equiv.l \lor val\_stack [0].equiv.h) \end{array} \right. \\ \left. \textbf{for } (j = greg; \ j < 255; \ j++) \\ \textbf{if } (greg\_val[j].l \equiv val\_stack [0].equiv.l \land greg\_val[j].h \equiv val\_stack [0].equiv.h) \end{array} \right. \\ \left. \begin{array}{l} cur\_greg = j; \\ \textbf{goto } got\_greg; \\ \\ \\ \\ \end{array} \right. \\ \left. \begin{array}{l} \textbf{if } (greg \equiv 32) \ err(\texttt{"too}\_many\_global\_registers"); \\ greg --; \\ greg\_val[greg] = val\_stack [0].equiv; \ cur\_greg = greg; \\ got\_greg: ; \\ \\ \end{array} \right.
```

This code is used in section 102.

109. If the label is, say 2H, we will already have used the old value of 2B when evaluating the operands. Furthermore, an operand of 2F will have been treated as undefined, which it still is.

Symbols can be defined more than once, but only if each definition gives them the same equivalent value.

A warning message is given when a predefined symbol is being redefined, if its predefined value has already been used.

```
 \left\{ \begin{array}{l} {\rm sym\_node} \ *new\_link = {\tt DEFINED}; \\ acc = cur\_loc; \\ {\rm if} \ (opcode \equiv {\tt IS}) \ \{ \\ {\rm if} \ (val\_stack \, [0].status \equiv undefined) \ err("{\tt the} \sqcup {\tt operand} \sqcup {\tt is} \sqcup {\tt undefined}"); \\ cur\_loc = val\_stack \, [0].equiv; \\ {\rm if} \ (val\_stack \, [0].status \equiv reg\_val) \ new\_link = {\tt REGISTER}; \\ {\rm else} \ {\rm if} \ (opcode \equiv {\tt GREG}) \ cur\_loc.h = 0, cur\_loc.l = cur\_greg, new\_link = {\tt REGISTER}; \\ {\rm Find} \ {\rm the} \ {\tt symbol} \ {\rm table} \ {\tt node}, \ pp \ {\tt 111} \ {\tt if} \ (pp \ {\tt if} \ (pp \ {\tt ip} \ {\tt cur\_loc.l} \lor pp \ {\tt equiv}.h \ne cur\_loc.h \lor pp \ {\tt link} \ne new\_link) \ \{ \\ {\rm if} \ (pp \ {\tt operal}) \ derr("{\tt symbol} \sqcup ``%s` \sqcup {\tt is} \sqcup {\tt already} \sqcup {\tt defined}", lab\_field); \\ pp \ {\tt operal} \ = ++ serial\_number; \\ derr("{\tt *redefinition} \sqcup {\tt of} \sqcup {\tt operal} \sqcup {\tt symbol} \sqcup ``%s`", lab\_field); \\ \end{array}
```

```
} else if (pp \rightarrow link \equiv PREDEFINED) pp \rightarrow serial = ++ serial\_number;
      else if (pp \rightarrow link) {
         if (new\_link \equiv \texttt{REGISTER}) err("future_||reference_||cannot_||be_||to_||a_||register");
         do \langle Fix prior references to this label 112\rangle while (pp \neg link);
      }
      if (isdigit(lab\_field[0])) pp = \&backward\_local[lab\_field[0] - `0'];
      pp \rightarrow equiv = cur\_loc; pp \rightarrow link = new\_link;
      \langle Fix references that might be in the val_stack 110\rangle;
      if (listing\_file \land (opcode \equiv IS \lor opcode \equiv LOC))
         (Make special listing to show the label equivalent 115);
      cur\_loc = acc;
This code is used in section 102.
110. \langle \text{Fix references that might be in the } val_{\underline{\phantom{a}}}stack | 110 \rangle \equiv
   if (\neg isdigit(lab\_field[0]))
      for (j = 0; j < val\_ptr; j ++)
         if (val\_stack[j].status \equiv undefined \land val\_stack[j].link \neg sym \equiv pp) {
            val\_stack[j].status = (new\_link \equiv REGISTER ? req\_val : pure);
            val\_stack[j].equiv = cur\_loc;
This code is used in section 109.
111. \langle Find the symbol table node, pp 111\rangle \equiv
   if (isdigit(lab\_field[0])) pp = \&forward\_local[lab\_field[0] - `O`];
   else {
      if (lab\_field[0] \equiv ":") tt = trie\_search(trie\_root, lab\_field + 1);
      else tt = trie\_search(cur\_prefix, lab\_field);
      pp = tt \rightarrow sym;
      if (\neg pp) pp = tt \neg sym = new\_sym\_node(true);
This code is used in section 109.
```

```
acc: octa, §83.
                                   incr: octa (), MMIX-ARITH §6.
                                                                     PREDEFINED = macro, \S58.
align\_bits = \#30000, \&62.
                                   IS = #101, \S62.
                                                                      pure = 0, \S 82.
backward_local: sym_node [],
                                   isdigit: int (), <ctype.h>.
                                                                      reg_val = 1, \S 82.
  ξ90.
                                   j: register int, §136.
                                                                     REGISTER = macro, \S58.
cur\_greg: int, \S 143.
                                   l: tetra, §26.
                                                                     serial: int, §58.
cur_loc: octa, §43.
                                   lab\_field: Char *, §33.
                                                                      serial_number: int, §60.
cur\_prefix: trie_node *, §56.
                                   link: sym_node *, §58.
                                                                      status: stat, §82.
DEFINED = macro, §58.
                                   link: trie_node *, §82.
                                                                      sym: sym_node *, §54.
derr = macro(), \S 45.
                                   listing_file: FILE *, §139.
                                                                      sym_node = struct, \S 58.
equiv: octa, §82.
                                  LOC = #102, §62.
                                                                      trie\_root: trie_node *, §56.
equiv: octa, §58.
                                   new_sym_node: sym_node
                                                                      trie_search: trie_node *(),
err = macro(), \S 45.
                                    *(), §59.
                                                                        §57.
forward_local: sym_node [],
                                   oand: octa (),
                                                                      true = 1, \S 26.
  §90.
                                    MMIX-ARITH §25.
                                                                      tt: register trie_node *, §65.
GREG = #106, \S62.
                                   op\_bits: tetra, §105.
                                                                      undefined = 2, \S 82.
greg: int, §143.
                                   opcode: tetra, §105.
                                                                      val\_ptr: int, §83.
greg_val: octa [], §133.
                                                                     val\_stack: val\_node *, §83.
                                   pp: register sym_node *,
h: tetra, §26.
                                     §65.
```

```
112. \langle Fix prior references to this label 112\rangle \equiv
      qq = pp \neg link;
     pp \rightarrow link = qq \rightarrow link;
     mmo\_loc();
     if (qq \rightarrow serial \equiv fix o) (Fix a future reference from an octabyte 113)
     else (Fix a future reference from a relative address 114);
      recycle\_fixup(qq);
This code is used in section 109.
113. \langle Fix a future reference from an octabyte 113\rangle \equiv
  {
     if (qq→equiv.h & #ffffff) {
         mmo\_lop(lop\_fixo, 0, 2);
         mmo\_tetra(qq \rightarrow equiv.h);
      } else mmo\_lop(lop\_fixo, qq \rightarrow equiv.h \gg 24, 1);
      mmo\_tetra(qq \rightarrow equiv.l);
This code is used in section 112.
114. \langle Fix a future reference from a relative address 114\rangle \equiv
  {
     octa o;
     o = ominus(cur\_loc, qq \rightarrow equiv);
     if (o.l & 3)
         dderr("*relative_address_in_location_#%08x%08x_not_divisible_by_4",
               qq \rightarrow equiv.h, qq \rightarrow equiv.l);
     o = shift\_right(o, 2, 0); k = 0;
     if (o.h \equiv 0)
        if (o.l < #10000) mmo\_lopp(lop\_fixr, o.l);
        else if (qq \rightarrow serial \equiv fix\_xyz \land o.l < #1000000) {
            mmo\_lop(lop\_fixrx, 0, 24); mmo\_tetra(o.l);
         } else k = 1;
     else if (o.h \equiv \#ffffffff)
         if (qq \rightarrow serial \equiv fix\_xyz \land o.l \ge \#ff000000) {
            mmo\_lop(lop\_fixrx, 0, 24); mmo\_tetra(o.l \& #1ffffff);
         } else if (qq \rightarrow serial \equiv fix yz \land o.l \ge ffff0000) {
            mmo\_lop(lop\_fixrx, 0, 16); mmo\_tetra(o.l \& #100ffff);
         } else k = 1;
     else k = 1;
     if (k) dderr("relative address in in location #%08x%08x is too far away",
               qq \rightarrow equiv.h, qq \rightarrow equiv.l);
This code is used in section 112.
```

```
\langle Make special listing to show the label equivalent 115\rangle \equiv
  if (new\_link \equiv DEFINED) {
     fprintf (listing_file, "(%08x%08x)", cur_loc.h, cur_loc.l);
     flush\_listing\_line("_{\bot \bot}");
   } else {
     fprintf (listing_file, "($\%03d)", cur_loc.l & \#ff);
     flush_listing_line("______");
This code is used in section 109.
116. (Do the operation 116) \equiv
  future\_bits = 0;
  if (op_bits & many_arg_bit) \langle Do a many-operand operation 117 \rangle
  else switch (val\_ptr) {
     case 1: if (\neg(op\_bits \& one\_arq\_bit))
           derr("opcode, '%s', needs, more, than, one, operand", op_field);
        (Do a one-operand operation 129);
     case 2: if (\neg(op\_bits \& two\_arq\_bit))
          if (op_bits & one_arg_bit)
              derr("opcode, '%s', must, not, have, two, operands", op_field)
           else derr("opcode__'%s',_must_have_more_than_two_operands", op_field);
        if ((op\_bits \& (three\_arg\_bit + mem\_bit)) \equiv three\_arg\_bit) goto make\_two\_three;
        (Do a two-operand operation 124);
     make\_two\_three: val\_stack[2] = val\_stack[1], val\_ptr = 3;
        val\_stack[1].equiv = zero\_octa, val\_stack[1].link = \Lambda, val\_stack[1].status = pure;
          /* insert 0 as the second operand */
     case 3: if (\neg(op\_bits \& three\_arg\_bit))
           derr("opcode, '%s', must, not, have, three, operands", op_field);
        \langle \text{ Do a three-operand operation } 119 \rangle;
     default: derr("too, many, operands, for opcode, '%s', op_field);
This code is used in section 102.
```

```
cur\_loc: octa, §43.
                                    listing_file: FILE *, §139.
                                                                       pp: register sym_node *,
                                   lop_{-}fixo = #3, \S 24.
dderr = macro(), \S 45.
                                                                         §65.
                                   lop_fixr = #4, \S 24.
DEFINED = macro, \S 58.
                                                                       pure = 0, \S 82.
derr = macro(), \S 45.
                                   lop_fixrx = #5, \S 24.
                                                                       qq: register sym_node *,
equiv: octa, §58.
                                   many\_arg\_bit = \#8000, \S62.
equiv: octa, §82.
                                   mem\_bit = \#80000, \S62.
                                                                       recycle\_fixup = macro(), \S 59.
fix_0 = 0, \S 58.
                                   mmo_loc: void (), §49.
                                                                       serial: int, §58.
fix_xyz = 2, \S 58.
                                   mmo_lop: void (), §48.
                                                                       shift_right: octa (),
fix_yz = 1, \S 58.
                                   mmo_lopp: void (), §48.
                                                                         MMIX-ARITH §7.
                                   mmo_tetra: void (), §48.
flush_listing_line: void (), §41.
                                                                       status: stat, §82.
                                                                       three\_arg\_bit = #4000, \S62.
fprintf: int (), <stdio.h>.
                                   new\_link: sym_node *, §109.
future_bits: int, §120.
                                   octa = struct, \S 26.
                                                                       two\_arg\_bit = #2000, \S62.
                                   ominus: octa (),
h: tetra, §26.
                                                                       val\_ptr: int, §83.
k: register int, §136.
                                    MMIX-ARITH §5.
                                                                       val\_stack: val_node *, §83.
l: tetra, §26.
                                   one\_arg\_bit = #1000, \S62.
                                                                       zero_octa: octa,
link: sym_node *, §58.
                                   op_bits: tetra, §105.
                                                                         MMIX-ARITH §4.
link: trie_node *, §82.
                                   op\_field: Char *, §33.
```

```
117.
         The many-operand operators are BYTE, WYDE, TETRA, and OCTA.
\langle \text{ Do a many-operand operation } 117 \rangle \equiv
  for (j = 0; j < val\_ptr; j++) {
      \langle \text{ Deal with cases where } val\_stack[j] \text{ is impure } 118 \rangle;
      k = 1 \ll (opcode - BYTE);
      if ((val\_stack[j].equiv.h \land opcode < OCTA) \lor
               (val\_stack[j].equiv.l > \#ffff \land opcode < TETRA) \lor
               (val\_stack[i].equiv.l > \#ff \land opcode < WYDE))
         if (k \equiv 1) err ("*constant, doesn't, fit, in, one, byte")
         else derr("*constant_doesn't_fit_in_kd_bytes", k);
     if (k < 8) assemble (k, val\_stack[j].equiv.l, 0);
     else if (val\_stack[j].status \equiv undefined) assemble (4, 0, \#f0), assemble (4, 0, \#f0);
     else assemble(4, val\_stack[j].equiv.h, 0), assemble(4, val\_stack[j].equiv.l, 0);
This code is used in section 116.
118. \langle \text{ Deal with cases where } val\_stack[j] \text{ is impure } 118 \rangle \equiv
  if (val\_stack[j].status \equiv req\_val) err("*register_inumber_iused_ias_ia_iconstant")
  else if (val\_stack[i].status \equiv undefined) {
     if (opcode \neq OCTA) err("undefined_constant");
     pp = val\_stack[j].link \rightarrow sym;
      qq = new\_sym\_node(false);
      qq \rightarrow link = pp \rightarrow link;
     pp \rightarrow link = qq;
      qq \rightarrow serial = fix\_o;
      qq \rightarrow equiv = cur\_loc;
This code is used in section 117.
119. (Do a three-operand operation 119) \equiv
   \langle \text{ Do the Z field } 121 \rangle:
   \langle \text{ Do the Y field } 122 \rangle;
assemble_X: (Do the X field 123);
assemble_inst: assemble (4, (opcode \ll 24) + xyz, future\_bits);
  break:
This code is used in section 116.
120. Individual fields of an instruction are placed into global variables z, y, x, yz,
and/or xyz.
\langle \text{Global variables } 27 \rangle + \equiv
  tetra z, y, x, yz, xyz; /* pieces for assembly */
                          /* places where there are future references */
  int future_bits;
121. \langle Do \text{ the Z field } 121 \rangle \equiv
  if (val\_stack[2].status \equiv undefined) \ err("Z_{\sqcup}field_{\sqcup}is_{\sqcup}undefined");
  if (val\_stack[2].status \equiv req\_val) {
     if (\neg(op\_bits \& (immed\_bit + zr\_bit + zar\_bit)))
         derr("*Z_{\sqcup}field_{\sqcup}of_{\sqcup}`%s'_{\sqcup}should_{\sqcup}not_{\sqcup}be_{\sqcup}a_{\sqcup}register_{\sqcup}number", op_field);
   } else if (op_bits & immed_bit) opcode++; /* immediate */
```

```
else if (op_bits & zr_bit)
      derr("*Z_{\bot}field_{\bot}of_{\bot}`%s, _{\bot}should_{\bot}be_{\bot}a_{\bot}register_{\bot}number", op\_field);
   if (val\_stack[2].equiv.h \lor val\_stack[2].equiv.l > \#ff)
      err("*Z_ifield_idoesn't_ifit_in_one_byte");
   z = val\_stack[2].equiv.l \& #ff;
This code is used in section 119.
         \langle Do \text{ the Y field } 122 \rangle \equiv
122.
   if (val\_stack[1].status \equiv undefined) \ err("Y_{\sqcup}field_{\sqcup}is_{\sqcup}undefined");
   if (val\_stack[1].status \equiv req\_val) {
      if (\neg(op\_bits \& (yr\_bit + yar\_bit)))
         derr("*Y⊔fielduofu'%s'ushouldunotubeuauregisterunumber", op_field);
   } else if (op\_bits \& yr\_bit)
      derr("*Y⊔field⊔of⊔'%s'⊔should⊔be⊔a⊔register⊔number", op_field);
   if (val\_stack[1].equiv.h \lor val\_stack[1].equiv.l > \#ff)
      err("*Y_field_doesn't_fit_in_one_byte");
   y = val\_stack[1].equiv.l \& #ff; yz = (y \ll 8) + z;
This code is used in section 119.
123. \langle \text{ Do the X field } 123 \rangle \equiv
   if (val\_stack[0].status \equiv undefined) \ err("X_lfield_lis_lundefined");
   if (val\_stack[0].status \equiv reg\_val) {
      if (\neg(op\_bits \& (xr\_bit + xar\_bit)))
         derr("*X_{\sqcup}field_{\sqcup}of_{\sqcup}`%s`_{\sqcup}should_{\sqcup}not_{\sqcup}be_{\sqcup}a_{\sqcup}register_{\sqcup}number", op\_field);
   } else if (op_bits & xr_bit)
      derr("*X_ifield_iof_i'%s'_ishould_ibe_ia_iregister_inumber", op_field);
   if (val\_stack[0].equiv.h \lor val\_stack[0].equiv.l > \#ff)
      err("*X⊔field⊔doesn't⊔fit⊔in⊔one⊔byte");
   x = val\_stack[0].equiv.l \& #ff; xyz = (x \ll 16) + yz;
This code is used in section 119.
```

```
assemble: void (), §52.
                                     link: trie_node *, §82.
                                                                           status: stat, §82.
BYTE = \# 108, §62.
                                     link: sym_node *, §58.
                                                                           sym: sym_node *, §54.
cur\_loc: octa, §43.
                                     new_sym_node: sym_node
                                                                           TETRA = ^{\#} 10a, \S62.
derr = macro(), \S 45.
                                       *(), §59.
                                                                           tetra = unsigned int, \S 26.
                                     OCTA = #10b, \S62.
                                                                           undefined = 2, \S 82.
equiv: \mathbf{octa}, §82.
equiv: octa, §58.
                                     op_bits: tetra, §105.
                                                                           val\_ptr: int, §83.
err = macro(), \S 45.
                                     op_field: Char *, §33.
                                                                           val\_stack: val_node *, §83.
                                                                           WYDE = ^{\#}109, §62.
false = 0, \S 26.
                                     opcode: tetra, §105.
fix_0 = 0, \S 58.
                                     pp: register sym_node *,
                                                                          xar_bit = #40, \S62.
h: tetra, §26.
                                       §65.
                                                                          xr_bit = #80, \S62.
immed\_bit = #2, \S 62.
                                     qq: register sym_node *,
                                                                           yar_bit = #10, \S62.
j: register int, §136.
                                                                           yr_{\bullet}bit = #20, \S62.
                                       \S 65.
                                                                           zar_{\bullet}bit = #4, \S 62.
k: register int, §136.
                                     req_val = 1, \S 82.
l: tetra, §26.
                                     serial: int, §58.
                                                                          zr_{\bullet}bit = \#8, \S62.
```

```
124. \langle \text{ Do a two-operand operation } 124 \rangle \equiv
  if (val\_stack[1].status \equiv undefined) {
     if (op_bits & rel_addr_bit)
        \langle Assemble YZ as a future reference and goto assemble_X 125\rangle
     else err("YZ_ifield_is_undefined");
  } else if (val\_stack[1].status \equiv req\_val) {
     if (\neg(op\_bits \& (immed\_bit + yzr\_bit + yzar\_bit)))
        derr("*YZ_field_of_'%s'_should_not_be_a_register_number", op_field);
     if (opcode \equiv SET) val\_stack[1].equiv.l \ll = 8, opcode = \#c1;
                                                                            /* change to OR */
     else if (op\_bits \& mem\_bit) val\_stack[1].equiv.l \ll = 8, opcode ++;
          /* silently append .0 */
  } else { /* val\_stack[1].status \equiv pure */
     if (op_bits & mem_bit)
        \langle Assemble YZ as a memory address and goto assemble_X 127\rangle;
     if (opcode \equiv SET) opcode = {}^{\#}e3; /* change to SETL */
     else if (op_bits & immed_bit) opcode++; /* immediate */
     else if (op_bits & yzr_bit) {
        derr("*YZ⊔field⊔of⊔'%s'⊔should⊔be⊔a⊔register⊔number", op_field);
     if (op_bits & rel_addr_bit)
        \langle Assemble YZ as a relative address and goto assemble X 126\rangle;
  if (val\_stack[1].equiv.h \lor val\_stack[1].equiv.l > \#ffff)
     err("*YZ⊔fieldudoesn'tufituinutwoubytes");
  yz = val\_stack[1].equiv.l \& #fffff;
  goto assemble_X:
This code is used in section 116.
125.
        \langle Assemble YZ as a future reference and goto assemble_X 125\rangle \equiv
     pp = val\_stack[1].link \rightarrow sym;
     qq = new\_sym\_node(false);
     qq \rightarrow link = pp \rightarrow link;
     pp \rightarrow link = qq;
     qq \rightarrow serial = fix_yz;
     qq \rightarrow equiv = cur\_loc;
     yz=0;
     future\_bits = {}^{\#}c0;
     goto assemble\_X;
This code is used in section 124.
126.
      \langle Assemble YZ as a relative address and goto assemble_X 126\rangle \equiv
     octa source, dest;
     if (val_stack[1].equiv.l & 3) err("*relative|address|is|not|divisible|by|4");
     source = shift\_right(cur\_loc, 2, 0);
     dest = shift\_right(val\_stack[1].equiv, 2, 0);
     acc = ominus(dest, source);
```

```
if (¬(acc.h & #8000000)) {
       if (acc.l > \#ffff \lor acc.h)
          err("relative,address_is_more_than_#ffff_tetrabytes_forward");
     } else {
       acc = incr(acc, {}^{\#}10000);
        opcode ++;
       if (acc.l > \#ffff \lor acc.h)
          err("relative_address_is_more_than_#10000_tetrabytes_backward");
     }
     yz = acc.l;
     goto assemble_X;
This code is used in section 124.
      \langle Assemble YZ as a memory address and goto assemble_X 127\rangle \equiv
  {
     o = val\_stack[1].equiv, k = 0;
     for (j = greg; j < 255; j++)
       if (qreq\_val[j].h \lor qreq\_val[j].l) {
          acc = ominus(val\_stack[1].equiv, greg\_val[j]);
          if (acc.h \le o.h \land (acc.l \le o.l \lor acc.h < o.h)) o = acc, k = j;
     if (o.l \le \text{\#ff} \land \neg o.h \land k) yz = (k \ll 8) + o.l, opcode +++;
     else if (¬expandinq) err("no||base||address||is||close||enough||to||the||address||A")
     else (Assemble instructions to put supplementary data in $255 128);
     goto assemble\_X;
This code is used in section 124.
```

```
acc: octa, §83.
                                   j: register int, §136.
                                                                       pure = 0, \S 82.
assemble_X: label, §119.
                                    k: register int, §136.
                                                                       qq: register sym_node *,
cur_loc: octa, §43.
                                   l: tetra, §26.
                                                                         §65.
derr = macro(), \S 45.
                                   link: trie_node *, §82.
                                                                       reg_val = 1, \S 82.
equiv: octa, §82.
                                   link: sym_node *, §58.
                                                                       rel_addr_bit = #1, §62.
                                   mem\_bit = \#80000, \S62.
                                                                       serial: int, §58.
equiv: octa, §58.
                                                                       SET = #100, \S62.
err = macro(), \S 45.
                                   new_sym_node: sym_node
expanding: int, §139.
                                     *(), §59.
                                                                       shift_right: octa (),
false = 0, \S 26.
                                   octa = struct, \S 26.
                                                                         MMIX-ARITH §7.
fix_yz = 1, \S 58.
                                   ominus: octa (),
                                                                       status: stat, §82.
future_bits: int, §120.
                                     MMIX-ARITH §5.
                                                                       sym: sym_node *, §54.
greg: int, §143.
                                   op_bits: tetra, §105.
                                                                       undefined = 2, \S 82.
greg_val: octa [], §133.
                                    op\_field: Char *, §33.
                                                                       val\_stack: val_node *, §83.
                                                                       yz: tetra, §120.
h: tetra, §26.
                                    opcode: tetra, §105.
                                                                       yzar_{-}bit = #100, \S 62.
immed\_bit = #2, \S 62.
                                   pp: register sym_node *,
incr: octa (), MMIX-ARITH §6.
                                                                       yzr_{\bullet}bit = #200, \S62.
                                      §65.
```

```
#define SETH #e0
128.
#define SETL #e3
#define ORH #e8
#define ORL #eb
\langle Assemble instructions to put supplementary data in $255 128 \rangle \equiv
     for (j = SETH; j < ORL; j++) {
       switch (i \& 3) {
       case 0: yz = o.h \gg 16; break; /* SETH */
       case 1: yz = o.h \& #fffff; break; /* SETMH or ORMH */
       case 2: yz = o.l \gg 16; break; /* SETML or ORML */
       case 3: yz = o.l \& #fffff; break; /* SETL or ORL */
       if (yz \lor j \equiv SETL) {
          assemble(4, (j \ll 24) + (255 \ll 16) + yz, 0);
          j \models \mathtt{ORH};
    if (k) yz = (k \ll 8) + 255; /* Y = \$k, Z = \$255 */
     else yz = 255 \ll 8, opcode ++; /* Y = $255, Z = 0 */
This code is used in section 127.
129. (Do a one-operand operation 129) \equiv
  if (val\_stack[0].status \equiv undefined) {
     if (op_bits & rel_addr_bit)
       (Assemble XYZ as a future reference and goto assemble_inst 130)
     else if (opcode \neq PREFIX) err("the_loperand_lis_lundefined");
  } else if (val\_stack[0].status \equiv reg\_val) {
     if (\neg(op\_bits \& (xyzr\_bit + xyzar\_bit)))
       derr("*operand, of, '%s', should, not, be, a, register, number", op_field);
  } else { /* val\_stack[0].status \equiv pure */
     if (op_bits & xyzr_bit)
       derr("*operand_of_'%s'_should_be_a_register_number", op_field);
     if (op_bits & rel_addr_bit)
       (Assemble XYZ as a relative address and goto assemble_inst 131);
  if (opcode > {}^{\#}ff) \langle Do a pseudo-operation and goto bypass 132\rangle;
  if (val\_stack[0].equiv.h \lor val\_stack[0].equiv.l > \#ffffff)
     err("*XYZ_field_doesn't_fit_in_three_bytes");
  xyz = val\_stack[0].equiv.l \& #ffffff;
  goto assemble_inst;
This code is used in section 116.
130.
        \langle Assemble XYZ as a future reference and goto assemble_inst 130\rangle \equiv
  {
     pp = val\_stack[0].link \rightarrow sym;
     qq = new\_sym\_node(false);
     qq \rightarrow link = pp \rightarrow link;
```

```
pp \rightarrow link = qq;
     qq \rightarrow serial = fix xyz;
     qq \rightarrow equiv = cur\_loc;
     xuz = 0:
     future\_bits = \#e0:
     goto assemble_inst;
This code is used in section 129.
131. (Assemble XYZ as a relative address and goto assemble_inst 131) \equiv
  {
     octa source, dest;
     if (val_stack[0].equiv.l & 3) err("*relative_address_is_not_divisible_by_4");
     source = shift\_right(cur\_loc, 2, 0);
     dest = shift\_right(val\_stack[0].equiv, 2, 0);
     acc = ominus(dest, source);
     if (¬(acc.h & #80000000)) {
       if (acc.l > \#fffffff \lor acc.h)
          err("relative_address_is_more_than_#ffffff_tetrabytes, forward");
     } else {
        acc = incr(acc, ^{\#}1000000);
        opcode ++;
       if (acc.l > \#fffffff \lor acc.h)
          err("relative_address_is_more_than_#1000000_tetrabytes_backward");
     }
     xyz = acc.l;
     goto assemble_inst;
This code is used in section 129.
```

```
acc: octa, §83.
                                   l: tetra, §26.
                                                                       qq: register sym_node *,
assemble: void (), §52.
                                   link: trie_node *, §82.
assemble_inst: label, §119.
                                   link: sym_node *, §58.
                                                                       reg_val = 1, \S 82.
bypass: label, §102.
                                   new_sym_node: sym_node
                                                                       rel_addr_bit = #1, \S 62.
cur_loc: octa, §43.
                                     *(), §59.
                                                                       serial: int, §58.
derr = macro(), \S 45.
                                   o: octa, §127.
                                                                       shift_right: octa (),
equiv: \mathbf{octa}, §82.
                                   octa = struct, \S 26.
                                                                        MMIX-ARITH §7.
equiv: octa, §58.
                                   ominus: octa (),
                                                                       status: stat, §82.
                                                                       sym: sym_node *, §54.
err = macro(), \S 45.
                                    MMIX-ARITH §5.
                                                                       undefined = 2, \S 82.
false = 0, \S 26.
                                   op\_bits: tetra, §105.
fix_xyz = 2, \S 58.
                                   op_field: Char *, §33.
                                                                       val\_stack: val_node *, §83.
future_bits: int, §120.
                                   opcode: tetra, §105.
                                                                      xyz: tetra, §120.
h: tetra, §26.
                                   pp: register sym_node *,
                                                                      xyzar_bit = #400, \S62.
                                                                      xyzr_{\bullet}bit = #800, \S62.
incr: octa (), MMIX-ARITH §6.
                                     §65.
j: register int, §136.
                                   PREFIX = #103, \S62.
                                                                       yz: tetra, §120.
k: register int, §136.
                                   pure = 0, \S 82.
```

```
132.
                     \langle \text{ Do a pseudo-operation and goto } bypass | 132 \rangle \equiv
      switch (opcode) {
      case LOC: cur\_loc = val\_stack[0].equiv;
      case IS: goto bypass;
      case PREFIX: if (¬val_stack[0].link) err("not_la_valid_prefix");
              cur\_prefix = val\_stack[0].link; goto bypass;
      case GREG: if (listing_file) \( \text{Make listing for GREG 134} \);
             goto bypass:
      case LOCAL: if (val\_stack[0].equiv.l > lreg) lreg = val\_stack[0].equiv.l;
             if (listing_file) {
                   fprintf(listing_file, "($%03d)", val_stack[0].equiv.l);
                   flush\_listing\_line("_\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\underline\un
             goto bypass;
      case BSPEC: if (val\_stack[0].equiv.l > \#fffff \lor val\_stack[0].equiv.h)
                    err("*operand_of_'BSPEC', doesn', tofit_in_two_bytes");
             mmo_loc(); mmo_sync();
             mmo\_lopp(lop\_spec, val\_stack[0].equiv.l);
             spec\_mode = true; spec\_mode\_loc = 0; goto bypass;
      case ESPEC: spec\_mode = false; goto bypass;
This code is used in section 129.
                     \langle Global variables 27\rangle + \equiv
                                                                    /* initial values of global registers */
      octa greg_val[256];
                     \langle Make listing for GREG 134\rangle \equiv
      if (val\_stack[0].equiv.l \lor val\_stack[0].equiv.h) {
             fprintf(listing_file, "($\%03d=\#\%08x", cur_greg, val_stack[0].equiv.h);
            flush\_listing\_line("_{\sqcup\sqcup\sqcup\sqcup}");
            fprintf(listing\_file, "_\luber\lambda\luber\lambda\lambda\lambda\lambda\rangle", val\_stack[0].equiv.l);
            flush\_listing\_line("

");
       } else {
            fprintf(listing_file, "($%03d)", cur_greq);
             flush_listing_line("______");
This code is used in section 132.
```

## 135. Running the program. On a UNIX-like system, the command

## mmixal [options] sourcefilename

will assemble the MMIXAL program in file sourcefilename, writing any error messages on the standard error file. (Nothing is written to the standard output.) The options, which may appear in any order, are:

- -o objectfilename Send the output to a binary file called objectfilename. If no -o specification is given, the object file name is obtained from the input file name by changing the final letter from 's' to 'o', or by appending '.mmo' if sourcefilename doesn't end with s.
- -1 listingname Output a listing of the assembled input and output to a text file called listingname.
- -x Expand memory-oriented commands that cannot be assembled as single instructions, by assembling auxiliary instructions that make temporary use of global register \$255.
- -b bufsize Allow up to bufsize characters per line of input.

BSPEC = #104, §62.
bypass: label, §102.
cur\_greg: int, §143.
cur\_loc: octa, §43.
cur\_prefix: trie\_node \*, §56.
equiv: octa, §82.
err = macro (), §45.
ESPEC = #105, §62.
false = 0, §26.
flush\_listing\_line: void (), §41.
fprintf: int (), <stdio.h>.

GREG = #106, §62. h: tetra, §26. IS = #101, §62. l: tetra, §26. link: trie\_node \*, §82. listing\_file: FILE \*, §139. LOC = #102, §62.

LOCAL = #107, §62. lop\_spec = #8, §24.

lreg: int, §143.

 $mmo\_loc: void (), §49.$   $mmo\_lopp: void (), §48.$   $mmo\_sync: void (), §50.$  octa = struct, §26. opcode: tetra, §105. PREFIX = #103, §62.  $spec\_mode: bool, §43.$   $spec\_mode\_loc: tetra, §43.$ true = 1, §26.

 $val\_stack$ : val\\_node \*, §83.

}

136. Here, finally, is the overall structure of this program. #include <stdio.h> #include <stdlib.h> #include <ctype.h> #include <string.h> #include <time.h> (Preprocessor definitions 31) (Type definitions 26) (Global variables 27) (Subroutines 28) int main(argc, argv) int argc; char \*argv[]; register int j, k; /\* all-purpose integers \*/ (Local variables 40);  $\langle \text{ Process the command line } 137 \rangle$ ; (Initialize everything 29); **while** (1) { (Get the next line of input text, or **break** if the input has ended 34); while (1) { (Process the next MMIXAL instruction or comment 102); if  $(\neg *buf\_ptr)$  break; if (listing\_file) { **if** (listing\_bits) listing\_clear(); else if (¬line\_listed) flush\_listing\_line("\_\_\_\_\_\_");  $\langle \text{ Finish the assembly } 142 \rangle;$ 137. The space after "-b" is optional, because MMIX-SIM does not use a space in this context.  $\langle \text{Process the command line } 137 \rangle \equiv$ for  $(j = 1; j < argc - 1 \land argv[j][0] \equiv '-'; j++)$ if  $(\neg argv[j][2])$  { if  $(argv[j][1] \equiv 'x')$  expanding = 1; else if  $(argv[j][1] \equiv \circ \circ)$   $j \leftrightarrow strcpy(obj\_file\_name, argv[j]);$ else if  $(argv[j][1] \equiv '1')$   $j \leftrightarrow strcpy(listing\_name, argv[j]);$ else if  $(argv[j][1] \equiv b' \land sscanf(argv[j+1], "%d", \&buf\_size) \equiv 1) j++;$ else break: } else if  $(argv[j][1] \neq b' \vee sscanf(argv[j] + 2, "%d", &buf_size) \neq 1)$  break; if  $(j \neq argc - 1)$  { "[-x]\_\_[-l\_listingname]\_\_[-b\_buffersize]\_\_[-o\_objectfilename]"); exit(-1);

```
src\_file\_name = arqv[j];
This code is used in section 136.
      \langle \text{ Open the files } 138 \rangle \equiv
  src\_file = fopen(src\_file\_name, "r");
  if (¬src_file) dpanic("Can't_lopen_the_source_tfile_"%s", src_file_name);
  if (\neg obj\_file\_name[0]) {
     j = strlen(src\_file\_name);
     if (src\_file\_name[j-1] \equiv 's') {
        strcpy(obj\_file\_name, src\_file\_name); obj\_file\_name[j-1] = 'o';
     }
     else sprintf(obj_file_name, "%s.mmo", src_file_name);
  }
  obj\_file = fopen(obj\_file\_name, "wb");
  if (\neg obj\_file) dpanic("Can't\_open\_the\_object\_file\_%s", obj\_file\_name);
  if (listing\_name[0]) {
     listing\_file = fopen(listing\_name, "w");
     if (¬listing_file) dpanic("Can't_lopen_the_listing_file_%s", listing_name);
This code is used in section 140.
139. \langle \text{Global variables } 27 \rangle + \equiv
  char *src_file_name;
                              /* name of the MMIXAL input file */
  char obj\_file\_name[FILENAME_MAX + 1];
                                                   /* name of the binary output file */
  char listing\_name[FILENAME\_MAX + 1];
                                                   /* name of the optional listing file */
  FILE *src_file, *obj_file, *listing_file;
  int expanding;
                        /* are we expanding instructions when base address fail? */
                     /* maximum number of characters per line of input */
  int buf_size;
140. \langle Initialize everything 29\rangle + \equiv
  \langle \text{ Open the files } 138 \rangle;
  filename[0] = src\_file\_name;
  filename\_count = 1;
  (Output the preamble 141);
141. \langle Output the preamble 141\rangle \equiv
  mmo\_lop(lop\_pre, 1, 1);
  mmo\_tetra(time(\Lambda));
  mmo\_cur\_file = -1;
This code is used in section 140.
```

```
buf\_ptr: Char *, §33.
                                  fopen: FILE *(), <stdio.h>.
                                                                     mmo_lop: void (), §48.
dpanic = macro(), \S 45.
                                  fprintf: int (), <stdio.h>.
                                                                     mmo\_tetra: void (), §48.
exit: void (), <stdlib.h>.
                                  line_listed: bool, §36.
                                                                     sprintf: int (), <stdio.h>.
FILE, <stdio.h>.
                                  listing_bits: unsigned char,
                                                                     sscanf: int (), <stdio.h>.
filename: Char *[], §37.
                                    ξ43.
                                                                     stderr: FILE *, <stdio.h>.
filename_count: int, §37.
                                  listing_clear: void (), §44.
                                                                     strcpy: char *(), <string.h>.
FILENAME_MAX = macro,
                                  lop\_pre = #9, \S 24.
                                                                     strlen: size_t (), <string.h>.
  <stdio.h>.
                                  mmo\_cur\_file: int, \S 51.
                                                                     time: time_t (), <time.h>.
flush_listing_line: void (), §41.
```

```
142.
       \langle Finish the assembly 142\rangle \equiv
  if (lreg \geq greg)
     dpanic("Danger: Must, reduce, the number, of, GREGs, by, %d", lreq - greq + 1);
  (Output the postamble 144);
   (Check and output the trie 80);
  (Report any undefined local symbols 145);
  if (err_count) {
     if (err_count > 1) fprintf(stderr, "(%d_errors_were_found.) \n", err_count);
     else fprintf(stderr, "(One_lerror_was_found.)\n");
  exit(err_count);
This code is used in section 136.
143. \langle Global variables 27\rangle +\equiv
  int greg = 255; /* global register allocator */
  int cur_greg:
                     /* global register just allocated */
  int lreq = 32;
                     /* local register allocator */
144. (Output the postamble 144) \equiv
  mmo\_lop(lop\_post, 0, greg);
  greg\_val[255] = trie\_search(trie\_root, "Main") \rightarrow sym \rightarrow equiv;
  for (j = greg; j < 256; j++) {
     mmo\_tetra(greg\_val[j].h);
     mmo\_tetra(greg\_val[j].l);
This code is used in section 142.
        \langle Report any undefined local symbols 145\rangle \equiv
  for (j = 0; j < 10; j++)
     if (forward\_local[j].link)
        err\_count +++, fprintf(stderr, "undefined\_local\_symbol\_%dF\n", j);
This code is used in section 142.
```

dpanic = macro (), §45.
equiv: octa, §58.
err\_count: int, §46.
exit: void (), <stdlib.h>.
forward\_local: sym\_node [],
§90.

fprintf: int (), <stdio.h>.

greg\_val: octa [], §133.h: tetra, §26.j: register int, §136.l: tetra, §26.

link: sym\_node \*, §58. lop\_post = #a, §24. mmo\_lop: void (), §48. mmo\_tetra: void (), §48. stderr: FILE \*, <stdio.h>. sym: sym\_node \*, §54. trie\_root: trie\_node \*, §56. trie\_search: trie\_node \*(),

§57.

## 146. Names of the sections.

```
(Align the location pointer 107) Used in section 102.
(Allocate a global register 108) Used in section 102.
(Assemble instructions to put supplementary data in $255 128) Used in section 127.
(Assemble XYZ as a future reference and goto assemble_inst 130) Used in sec-
(Assemble XYZ as a relative address and goto assemble_inst 131) Used in sec-
  tion 129.
\langle Assemble YZ as a future reference and goto assemble_X 125\rangle
                                                                       Used in section 124.
\langle Assemble YZ as a memory address and goto assemble_X 127\rangle
                                                                       Used in section 124.
(Assemble YZ as a relative address and goto assemble_X 126)
                                                                      Used in section 124.
 Cases for binary operators 99, 101 \ Used in section 98.
Cases for unary operators 100 \ Used in section 98.
 Check and output the trie 80 \ Used in section 142.
 Check for a line directive 38 \ Used in section 34.
 Copy the operand field 106 \ Used in section 102.
 Deal with cases where val\_stack[j] is impure 118 \rangle Used in section 117.
 Define the label 109 \rangle Used in section 102.
(Do a many-operand operation 117) Used in section 116.
 Do a one-operand operation 129 \rangle Used in section 116.
Do a pseudo-operation and goto bypass 132 \rangle Used in section 129.
 Do a three-operand operation 119 \ Used in section 116.
 Do a two-operand operation 124 \rangle Used in section 116.
 Do the operation 116 \ Used in section 102.
 Do the X field 123 \ Used in section 119.
 Do the Y field 122 \ Used in section 119.
(Do the Z field 121) Used in section 119.
 Encode the length of t \rightarrow sym \rightarrow equiv 76 \ Used in section 74.
Find the symbol table node, pp 111 \text{ Used in section 109.}
 Finish the assembly 142 \rangle Used in section 136.
 Fix a future reference from a relative address 114 \rightarrow Used in section 112.
 Fix a future reference from an octabyte 113 \ Used in section 112.
 Fix prior references to this label 112 \ Used in section 109.
Fix references that might be in the val\_stack 110 \ Used in section 109.
 Flush the excess part of an overlong line 35 \ Used in section 34.
(Get the next line of input text, or break if the input has ended 34) Used in
  section 136.
(Global variables 27, 33, 36, 37, 43, 46, 51, 56, 60, 63, 67, 69, 77, 83, 90, 105, 120, 133, 139, 143)
  Used in section 136.
(Initialize everything 29, 32, 61, 71, 84, 91, 140) Used in section 136.
(Local variables 40, 65) Used in section 136.
(Make listing for GREG 134) Used in section 132.
(Make special listing to show the label equivalent 115) Used in section 109.
\langle Make sure cur\_loc and mmo\_cur\_loc refer to the same tetrabyte 53 \rangle Used in
  section 52.
```

```
(Open the files 138) Used in section 140.
(Output the postamble 144) Used in section 142.
Output the preamble 141 \ Used in section 140.
\langle Perform the top operation on op_stack 98\rangle Used in section 85.
(Preprocessor definitions 31, 39) Used in section 136.
\langle \text{Print symbol } sym\_buf \text{ and its equivalent } 78 \rangle Used in section 75.
Process the command line 137 \ Used in section 136.
(Process the next MMIXAL instruction or comment 102) Used in section 136.
(Put other predefined symbols into the trie 70) Used in section 61.
Put the MMIX opcodes and MMIXAL pseudo-ops into the trie 64 \ Used in section 61.
(Put the special register names into the trie 66) Used in section 61.
Report an undefined symbol 79 \ Used in section 74.
Report any undefined local symbols 145 \ Used in section 142.
Scan a backward local 89 \ Used in section 86.
 Scan a binary operator or closing token, rt\_op 97 \ Used in section 85.
 Scan a character constant 92 \ Used in section 86.
 Scan a decimal constant 94 \ Used in section 86.
 Scan a forward local 88 \ Used in section 86.
 Scan a hexadecimal constant 95 \ Used in section 86.
 Scan a string constant 93 \ Used in section 86.
 Scan a symbol 87 \ Used in section 86.
 Scan opening tokens until putting something on val_stack 86 \ Used in section 85.
 Scan the current location 96 \ Used in section 86.
 Scan the label field; goto bypass if there is none 103 Used in section 102.
 Scan the opcode field; goto bypass if there is none 104 V used in section 102.
 Scan the operand field 85 \ Used in section 102.
 Subroutines 28, 41, 42, 44, 45, 47, 48, 49, 50, 52, 55, 57, 59, 73, 74 \ Used in section 136.
Type definitions 26, 30, 54, 58, 62, 68, 82 \ Used in section 136.
\langle \text{ Visit } t \text{ and traverse } t \rightarrow mid 75 \rangle Used in section 74.
```

1. Introduction. This CWEB program simulates how the MMIX computer might be implemented with a high-performance pipeline in many different configurations. All of the complexities of MMIX's architecture are treated, except for multiprocessing and low-level details of memory mapped input/output.

The present program module, which contains the main routine for the MMIX metasimulator, is primarily devoted to administrative tasks. Other modules do the actual work after this module has told them what to do.

2. A user typically invokes the meta-simulator with a UNIX-like command line of the general form 'mmmix configfile progfile', where the configfile describes the characteristics of an MMIX implementation and the progfile contains a program to be downloaded and run. Rules for configuration files appear in the module called mmix-config. The program file is either an "MMIX binary file" dumped by MMIX-SIM, or an ASCII text file that describes hexadecimal data in a rudimentary format. It is assumed to be binary if its name ends with the extension '.mmb'.

```
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "mmix-pipe.h"
 char *config_file_name, *prog_file_name;
  (Global variables 5)
  (Subroutines 10)
 int main(argc, argv)
      int argc;
       \mathbf{char} * argv[];
    ⟨ Parse the command line 3⟩;
    MMIX_config(config_file_name);
    MMIX\_init();
    mmix\_io\_init();
    \langle \text{ Input the program 4} \rangle;
    ⟨Run the simulation interactively 13⟩;
    printf("Simulation||ended||at||time||%d.\n", ticks.l);
    print_stats();
    return 0;
```

**3.** The command line might also contain options, some day. For now I'm forgetting them and simplifying everything until I gain further experience.

```
 \begin{split} &\langle \, \text{Parse the command line } \, 3 \, \rangle \equiv \\ & \quad \text{if } ( \textit{argc} \neq 3 ) \, \, \{ \\ & \quad \textit{fprintf} \, ( \textit{stderr} \, , \, \text{"Usage:} \, \text{\_"} \, \text{$\mu$suconfigfile_progfile} \, , \, \textit{argv} \, [0] ); \\ & \quad \textit{exit} \, (-3); \\ & \quad \textit{config\_file\_name} \, = \, \textit{argv} \, [1]; \end{split}
```

495 MMMIX: INTRODUCTION

MMIX-PIPE §10.

mmix\_io\_init: void (),

MMIX-IO §7.

print\_stats: void (),

MMIX-PIPE §162.

printf: int (), <stdio.h>.

prog\_file: FILE \*, §5.
stderr: FILE \*, <stdio.h>.
strcmp: int (), <string.h>.
strlen: size\_t (), <string.h>.
ticks: Extern octa,
MMIX-PIPE §87.

5. Hexadecimal input to memory. A rudimentary hexadecimal input format is implemented here so that the simulator can be run with essentially arbitrary data in the simulated memory. The rules of this format are extremely simple: Each line of the file either begins with (i) 12 hexadecimal digits followed by a colon; or (ii) a space followed by 16 hexadecimal digits. In case (i), the 12 hex digits specify a 48-bit physical address, called the current location. In case (ii), the 16 hex digits specify an octabyte to be stored in the current location; the current location is then increased by 8. The current location should be a multiple of 8, but its three least significant bits are actually ignored. Arbitrary comments can follow the specification of a new current location or a new octabyte, as long as each line is less than 99 characters long. For example, the file

```
0123456789ab: SILLY EXAMPLE
0123456789abcdef first octabyte
fedbca9876543210 second
```

places the octabyte #0123456789abcdef into memory location #0123456789a8 and #fedcba9876543210 into location #0123456789b0.

```
#define BUF_SIZE 100
\langle \text{Global variables 5} \rangle \equiv
  octa cur_loc;
  octa cur_dat;
  bool new_chunk:
  char buffer [BUF_SIZE];
  FILE *prog_file;
See also sections 16 and 25.
This code is used in section 2.
     \langle \text{Input a rudimentary hexadecimal file } 6 \rangle \equiv
  {
     prog\_file = fopen(prog\_file\_name, "r");
     if (\neg prog\_file) {
        fprintf(stderr, "Panic: \_Can't\_open\_MMIX\_hexadecimal\_file\_%s! \n",
             proq_file_name);
        exit(-3);
     new\_chunk = true;
     while (1)
        if (\neg fgets(buffer, BUF\_SIZE, prog\_file)) break;
        if (buffer[strlen(buffer) - 1] \neq '\n') {
          fprintf(stderr, "Panic: _Hexadecimal__file__line__too__long: __'%s...'!\n", buffer);
           exit(-3);
        }
```

```
if (buffer[12] \equiv ":") (Change the current location 7)
        else if (buffer[0] \equiv ' \cup ') (Read an octabyte and advance cur_loc 8)
        else {
           fprintf(stderr, "Panic:,|Improper,|hexadecimal,|file,|line:,|'%s'!\n", buffer);
           exit(-3):
     }
   }
This code is used in section 4.
      \langle Change the current location 7 \rangle \equiv
     if (sscanf(buffer, "%4x%8x", \&cur\_loc.h, \&cur\_loc.l) \neq 2) {
        fprintf(stderr, "Panic: | Improper | hexadecimal | file | location: | '%s'!\n", buffer);
        exit(-3);
      new\_chunk = true;
   }
This code is used in section 6.
     \langle \text{Read an octabyte and advance } cur\_loc \ 8 \rangle \equiv
     if (sscanf(buffer + 1, "\%8x\%8x", \&cur\_dat.h, \&cur\_dat.l) \neq 2) {
        fprintf(stderr, "Panic: \( \) Improper \( \) hexadecimal \( \) file \( \) data: \( \) '%s'! \( \)n", \( buffer \);
        exit(-3);
     if (new_chunk) mem_write(cur_loc, cur_dat);
     else mem\_hash[last\_h].chunk[(cur\_loc.l \& #fffff) \gg 3] = cur\_dat;
      cur\_loc.l += 8:
     if ((cur\_loc.l \& #fff8) \neq 0) new\_chunk = false;
     else {
        new\_chunk = true;
        if ((cur\_loc.l \& \#ffff0000) \equiv 0) \ cur\_loc.h ++;
This code is used in section 6.
```

```
bool = enum, MMIX-PIPE §11.
                                 fprintf: int (), <stdio.h>.
                                                                    MMIX-PIPE §213.
chunk: octa *, MMIX-PIPE §206. h: tetra, MMIX-PIPE §17.
                                                                   \mathbf{octa} = \mathbf{struct}, MMIX-PIPE §17.
exit: void (), <stdlib.h>.
                                 l: tetra, MMIX-PIPE §17.
                                                                  proq_file_name, §2.
false = 0, mmix-pipe §11.
                                 last_h: int, MMIX-PIPE §211.
                                                                  sscanf: int (), <stdio.h>.
fgets: char *(), <stdio.h>.
                                 mem_hash: chunknode *,
                                                                   stderr: FILE *, <stdio.h>.
FILE, <stdio.h>.
                                  MMIX-PIPE §207.
                                                                   strlen: size_t (), <string.h>.
fopen: FILE *(), <stdio.h>.
                                 mem_write: void (),
                                                                   true = 1, mmix-pipe §11.
```

10. The *undump\_octa* routine reads eight bytes from the binary file *prog\_file* into the global octabyte *cur\_dat*, taking care as usual to be big-endian regardless of the host computer's bias.

```
 \begin{array}{l} \langle \, \text{Subroutines} \,\, 10 \, \rangle \equiv \\ \text{static bool} \,\, undump\_octa \,\, \text{ARGS}((\textbf{void})); \\ \text{static bool} \,\, undump\_octa() \\ \{ \\ \text{register int} \,\, t0, \,\, t1, \,\, t2, \,\, t3; \\ t0 = fgetc(prog\_file); \,\, \text{if} \,\, (t0 \equiv \texttt{EOF}) \,\,\, \text{return} \,\, false; \\ t1 = fgetc(prog\_file); \,\, \text{if} \,\, (t1 \equiv \texttt{EOF}) \,\,\, \text{goto} \,\, oops; \\ t2 = fgetc(prog\_file); \,\, \text{if} \,\, (t2 \equiv \texttt{EOF}) \,\,\, \text{goto} \,\, oops; \\ t3 = fgetc(prog\_file); \,\, \text{if} \,\, (t3 \equiv \texttt{EOF}) \,\,\, \text{goto} \,\, oops; \\ cur\_dat.h = (t0 \ll 24) + (t1 \ll 16) + (t2 \ll 8) + t3; \\ t0 = fgetc(prog\_file); \,\, \text{if} \,\, (t0 \equiv \texttt{EOF}) \,\,\, \text{goto} \,\, oops; \\ t1 = fgetc(prog\_file); \,\, \text{if} \,\, (t1 \equiv \texttt{EOF}) \,\,\, \text{goto} \,\, oops; \\ t2 = fgetc(prog\_file); \,\, \text{if} \,\, (t2 \equiv \texttt{EOF}) \,\,\, \text{goto} \,\, oops; \\ t3 = fgetc(prog\_file); \,\, \text{if} \,\, (t3 \equiv \texttt{EOF}) \,\,\, \text{goto} \,\, oops; \\ cur\_dat.l = (t0 \ll 24) + (t1 \ll 16) + (t2 \ll 8) + t3; \end{array}
```

This code is used in section 4.

```
return true;
  oops: fprintf(stderr, "Premature_end_of_file_on_%s!\n", prog_file_name);
     return false;
See also sections 17 and 20.
This code is used in section 2.
11. (Input consecutive octabytes beginning at cur_loc 11) \equiv
  while (1) {
     if (\neg undump\_octa()) {
       fprintf(stderr, "Unexpected, end, of, file, on, %s!\n", proq_file_name);
     if (\neg(cur\_dat.h \lor cur\_dat.l)) break;
     if (bad_address) {
       fprintf(stderr, "Panic: Unsupported virtual address %08x%08x! n", cur_loc.h,
             cur\_loc.l);
        exit(-5);
     if (new_chunk) mem_write(cur_loc, cur_dat);
     else mem\_hash[last\_h].chunk[(cur\_loc.l \& #fffff) \gg 3] = cur\_dat;
     cur\_loc.l += 8;
     if ((cur\_loc.l \& #fff8) \neq 0) new\_chunk = false;
     else {
       new\_chunk = true;
       if ((cur\_loc.l \& #ffff0000) \equiv 0) {
          bad\_address = true:
          cur\_loc.h = (cur\_loc.h \ll 29) + 1;
     }
This code is used in section 9.
```

```
ARGS = macro (), MMIX-PIPE §6.
                                false = 0, mmix-pipe §11.
                                                                   MMIX-PIPE \S 207.
bad_address: bool, §25.
                                fgetc: int (), <stdio.h>.
                                                                 mem\_write: void (),
bool = enum, MMIX-PIPE §11.
                                fopen: FILE *(), <stdio.h>.
                                                                   MMIX-PIPE §213.
chunk: octa *, MMIX-PIPE §206. fprintf: int (), <stdio.h>.
                                                                 new_chunk: bool, §5.
cur\_dat: octa, §5.
                                h: tetra, MMIX-PIPE §17.
                                                                  prog\_file: FILE *, §5.
cur\_loc: octa, §5.
                                l: tetra, MMIX-PIPE §17.
                                                                 prog\_file\_name, §2.
EOF = (-1), <stdio.h>.
                                last_h: int, MMIX-PIPE §211.
                                                                 stderr: FILE *, <stdio.h>.
exit: void (), <stdlib.h>.
                                mem_hash: chunknode *,
                                                                 true = 1, MMIX-PIPE §11.
```

12. The primitive operating system assumed in simple programs of *The Art of Computer Programming* will set up text segment, data segment, pool segment, and stack segment as in MMIX-SIM. The runtime stack will be initialized if we UNSAVE from the last location loaded in the .mmb file.

```
#define rQ 16
\langle Set up the cannel environment 12\rangle \equiv
  if (cur\_loc.h \neq 3) {
     fprintf(stderr, "Panic: \( \text{MMIX} \) binary \( \text{file} \) didn't \( \text{set} \) up \( \text{the} \) tack! \\ n" \);
  inst\_ptr.o = mem\_read(incr(cur\_loc, -8 * 14));
                                                          /* Main */
  inst\_ptr.p = \Lambda;
  cur\_loc.h = {}^{\#}600000000;
  q[255].o = incr(cur\_loc, -8); /* place to UNSAVE */
  cur_{-}dat.l = {}^{\#}90:
  if (mem\_read(cur\_dat).h) inst\_ptr.o = cur\_dat; /* start at #90 if nonzero */
  head \rightarrow inst = (UNSAVE \ll 24) + 255, tail --; /* prefetch a fabricated command */
  head \rightarrow loc = incr(inst\_ptr.o, -4);
                                          /* in case the UNSAVE is interrupted */
  q[rT].o.h = *80000005, q[rTT].o.h = *80000006;
   cur\_dat.h = (RESUME \ll 24) + 1, cur\_dat.l = 0, cur\_loc.h = 5, cur\_loc.l = 0;
  mem\_write(cur\_loc, cur\_dat); /* the primitive trap handler */
   cur_{-}dat.l = cur_{-}dat.h, cur_{-}dat.h = (NEGI \ll 24) + (255 \ll 16) + 1;
   cur\_loc.h = 6, cur\_loc.l = 8;
                                      /* the primitive dynamic trap handler */
  mem\_write(cur\_loc, cur\_dat);
  cur\_dat.h = (GET \ll 24) + rQ, cur\_dat.l = (PUTI \ll 24) + (rQ \ll 16), cur\_loc.l = 0;
  mem_write(cur_loc, cur_dat);
                                       /* more of the primitive dynamic trap handler */
  cur_{\bullet}dat.h = 0, cur_{\bullet}dat.l = 7;
                                      /* generate a PTE with rwx permission */
   cur\_loc.h = 4;
                      /* beginning of skeleton page table */
  mem\_write(cur\_loc, cur\_dat);
                                       /* PTE for the text segment */
  ITcache \rightarrow set[0][0].tag = zero\_octa;
  ITcache \rightarrow set[0][0].data[0] = cur\_dat;
                                             /* prime the IT cache */
                      /* PTE with read and write permission only */
   cur\_dat.l = 6;
   cur\_dat.h = 1, cur\_loc.l = 3 \ll 13;
  mem\_write(cur\_loc, cur\_dat);
                                       /* PTE for the data segment */
   cur_{-}dat.h = 2, cur_{-}loc.l = 6 \ll 13;
  mem\_write(cur\_loc, cur\_dat);
                                       /* PTE for the pool segment */
  cur\_dat.h = 3, cur\_loc.l = 9 \ll 13;
  mem\_write(cur\_loc, cur\_dat);
                                       /* PTE for the stack segment */
  g[rK].o = neg\_one;
                            /* enable all interrupts */
  q[rV].o.h = *369c2004;
  page\_bad = false, page\_r = 4 \ll (32 - 13), page\_s = 32, page\_mask.l = \#fffffffff;
  page_b[1] = 3, page_b[2] = 6, page_b[3] = 9, page_b[4] = 12;
This code is used in section 9.
```

 $cur\_dat$ : **octa**, §5.  $cur\_loc:$  **octa**, §5. data: octa \*, MMIX-PIPE §167. exit: void (), <stdlib.h>. false = 0, mmix-pipe §11. fprintf: int (), <stdio.h>.  $g: \mathbf{int}, \text{MMIX-PIPE } 167.$ GET =  $^{\#}$ fe, MMIX-PIPE §47.  $h: \mathbf{tetra}, \text{MMIX-PIPE} \S 17.$ head: **fetch** \*, MMIX-PIPE §69. incr: octa (), MMIX-ARITH §6. inst: tetra, MMIX-PIPE §68.  $inst\_ptr: \mathbf{spec}, \text{MMIX-PIPE} \S 284.$ ITcache: cache \*, MMIX-PIPE §168. l: tetra, MMIX-PIPE §17.

loc: octa, MMIX-PIPE §44.

mem\_read: octa (),

MMIX-PIPE §210.

mem\_write: void (),

MMIX-PIPE §213.

neg\_one: octa, MMIX-ARITH §4.

NEGI = #35, MMIX-PIPE §47.

o: octa, MMIX-PIPE §40.

p: specnode \*, MMIX-PIPE §40.

page\_b: int [], MMIX-PIPE §238.

page\_bad: bool,

MMIX-PIPE §238.

page\_mask: octa,

MMIX-PIPE §238.

page\_r: int, MMIX-PIPE §238.

 $page_s: int, \text{MMIX-PIPE } \$238.$  PUTI = #f7, MMIX-PIPE \$47. RESUME = #f9, MMIX-PIPE \$47. rK = 15, MMIX-PIPE \$52. rT = 13, MMIX-PIPE \$52. rT = 14, MMIX-PIPE \$52. rV = 18, MMIX-PIPE \$52. set: cacheset \*, mmix-pipe \$167. stderr: FILE \*, <stdio.h>. tag: octa, mmix-pipe \$167. tail: fetch \*, mmix-pipe \$47. tail: fetch \*, mmix-pipe \$47.

- **13. Interaction.** When prompted for instructions, this simulator understands the following terse commands:
- (positive integer): Run for this many clock cycles.
- @\hexadecimal integer\: Set the instruction pointer to this virtual address; successive instructions will be fetched from here.
- k: Toggle the sign bit of the instruction pointer.
- $b\langle \text{hexadecimal integer} \rangle$ : Set the breakpoint to this virtual address; simulation will pause when an instruction from the breakpoint address enters the fetch buffer.
- $v\langle$  hexadecimal integer  $\rangle$ : Set the desired level of diagnostic output; each bit in the hexadecimal integer enables certain printouts when the simulator is running. Bit #1 shows instructions when issued, deissued, or committed; #2 shows the pipeline and locks after each cycle; #4 shows each coroutine activation; #8 each coroutine scheduling; #10 reports when reading from an uninitialized chunk of memory; #20 asks for online input when reading from addresses  $\geq 2^{48}$ ; #40 reports all I/O to memory address  $\geq 2^{48}$ ; #80 shows details of branch prediction; #100 displays full cache contents including blocks with invalid tags.
- -\langle integer \rangle: Deissue this many instructions.
- 1(integer) or g(integer): Show current "hot" contents of a local or global register.
- m(hexadecimal integer): Show current contents of a physical memory address. (This value may not be up to date; newer values might appear in the write buffer and/or in the caches.)
- f \ hexadecimal integer \): Insert a tetrabyte into the fetch buffer. (Use with care!)
- i(integer): Set the interval counter rI to the given value; this will trigger an interrupt after the specified number of cycles.
- IT, DT, I, D, or S: Show current contents of a cache.
- D\* or S\*: Show dirty blocks of a cache.
- p: Show current contents of the pipeline.
- s: Show current statistics on branch prediction and speed of instruction issue.
- h: Help (show the possibilities for interaction).
- q: Quit.

```
⟨ Run the simulation interactively 13⟩ ≡
while (1) {
    printf("mmmix>□"); fflush(stdout);
    fgets(buffer, BUF_SIZE, stdin);
    switch (buffer[0]) {
    default: what_say:
        printf("Eh?□Sorry,□I□don't□understand.□(Type□h□for□help)\n");
        continue;
    case 'q': case 'x': goto done;
        ⟨ Cases for interaction 14⟩
    }
}
```

503 MMMIX: INTERACTION

done:

This code is used in section 2.

```
\langle \text{ Cases for interaction } 14 \rangle \equiv
case 'h': case '?': printf("The interactive icommands iare ias ifollows:\n");
  printf("_i<n>_ito_irun_ifor_in_icycles\n");
  printf("__@<x>__tto__ttake__next__instruction__from__location__x\n");
  printf("| k_{\parallel \parallel \parallel \parallel \parallel} to_{\parallel} change_{\parallel} the_{\parallel} sign_{\parallel} bit_{\parallel} of_{\parallel} the_{\parallel} instruction_{\parallel} location \");
  printf(" \cup b < x > \cup to \cup pause \cup when \cup location \cup x \cup is \cup fetched \n");
  printf("_v<x>_to_print_specified_diagnostics_when_running; \n");
  printf("_{||L||L||L||L|}10[uninitialized_read]+20[online_{||}I/0_{||}read]+\n");
  printf("_{\cup \cup \cup \cup \cup \cup} 40[I/O_{\cup} read/write] + 80[branch_{\cup} prediction_{\cup} details] + \n");
  printf("||||||||||100[invalid||cache||blocks||displayed||too]\n");
  printf(" - < n > to deissue n instructions n");
  printf("|_1 < n >_1 to_1 print_1 current_1 value_1 of_1 local_1 register_1 n n");
  printf("_ig<n>_ito_iprint_icurrent_ivalue_iof_iglobal_iregister_in\n");
  printf("um<x>utouprintucurrentuvalueuofumemoryuaddressux\n");
  printf("_if<x>_ito_insert_instruction_ix_into_ithe_ifetch_buffer\n");
  printf("_{\sqcup}i < n >_{\sqcup}to_{\sqcup}initiate_{\sqcup}a_{\sqcup}timer_{\sqcup}interrupt_{\sqcup}after_{\sqcup}n_{\sqcup}cycles \n");
  printf(" \sqcup IT, \sqcup DT, \sqcup I, \sqcup D, \sqcup or \sqcup S \sqcup to \sqcup print \sqcup current \sqcup cache \sqcup contents \n");
  printf("_{|}D*_{|}or_{|}S*_{|}to_{|}print_{|}dirty_{|}blocks_{|}of_{|}a_{|}cache n");
  printf("\_\p\_\to\_\print\_\current\_\pipeline\_\contents\n");
  printf(" \sqcup s \sqcup to \sqcup print \sqcup current \sqcup stats \n");
  printf(" h_{\sqcup} to_{\sqcup} print_{\sqcup} this_{\sqcup} message n");
  printf(" q to exit n");
  printf("(Here, <n>, is, a, decimal, integer, <x>, is, hexadecimal.) \n");
  continue:
```

See also sections 15, 18, 19, 21, 22, 23, and 24.

This code is used in section 13.

BUF SIZE = 100. \$5. fgets: char \*(), <stdio.h>. stdin: FILE \*. <stdio.h>. buffer: char [], §5. printf: int (), <stdio.h>. stdout: FILE \*, <stdio.h>. fflush: int (), <stdio.h>.

```
15. \langle \text{Cases for interaction } 14 \rangle + \equiv
case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7':
  case '8': case '9':
  if (sscanf(buffer, "%d", &n) \neq 1) goto what\_say;
  printf("Running_{\square}%d_{\square}at_{\square}time_{\square}%d", n, ticks.l);
  if (bp.h \equiv (\mathbf{tetra}) - 1 \land bp.l \equiv (\mathbf{tetra}) - 1) printf("\n");
  else printf("|with|breakpoint|%08x%08x\n", bp.h, bp.l);
  MMIX\_run(n, bp); continue;
case '@': inst\_ptr.o = read\_hex(buffer + 1); goto new\_inst\_ptr;
case 'k': inst\_ptr.o.h \oplus = \#80000000; /* shortcut to kernel mode */
  if (\neg ticks.l \land head) head \neg loc.h \oplus = \#80000000;
                                                           /* fix the UNSAVE loc */
new\_inst\_ptr: if (inst\_ptr.o.h \& #80000000) \ q[rK].o.h \& = -2;
        /* disable interrupts on P_BIT */
  inst\_ptr.p = \Lambda: continue:
case 'b': bp = read\_hex(buffer + 1); continue;
case 'v': verbose = read\_hex(buffer + 1).l; continue;
16.
     \langle \text{Global variables } 5 \rangle + \equiv
  int n, m;
                  /* temporary integer */
  octa bp = \{-1, -1\}; /* breakpoint */
                  /* an octabyte of temporary interest */
  octa tmp:
  static unsigned char d[BUF\_SIZE];
      Here's a simple program to read an octabyte in hexadecimal notation from a
buffer. It changes the buffer by storing a null character after the input.
\langle Subroutines 10\rangle + \equiv
  octa read_hex ARGS((char *));
  octa read\_hex(p)
        char *p;
     register int j, k;
     octa val:
     val.h = val.l = 0;
     for (j = 0; ; j++) {
        if (p[j] \ge 0, \land p[j] \le 9, d[j] = p[j] - 0,
        else if (p[j] \geq \text{'a'} \land p[j] \leq \text{'f'}) d[j] = p[j] - \text{'a'} + 10;
        else if (p[j] \geq A' \wedge p[j] \leq F' d[j] = p[j] - A' + 10;
        else break;
     }
     p[j] = '\0';
     for (j--, k=0; k \le j; k++) {
        if (k \ge 8) val.h += d[j-k] \ll (4*k-32);
        else val.l += d[i - k] \ll (4 * k);
     }
     return val;
  }
18. \langle \text{Cases for interaction } 14 \rangle + \equiv
case '-': if (sscanf(buffer + 1, "%d", \&n) \neq 1 \lor n < 0) goto what\_say;
  if (cool < hot) m = hot - cool; else m = (hot - reorder\_bot) + 1 + (reorder\_top - cool);
```

505 MMMIX: INTERACTION

```
if (n > m) deissues = m; else deissues = n;
  continue:
case '1': if (sscanf(buffer + 1, "%d", \&n) \neq 1 \lor n < 0) goto what\_say:
  if (n > lrinq\_size) goto what_say;
  printf("_{||}1[\%d]=\%08x\%08x\n", n, l[n].o.h, l[n].o.l); continue;
case 'm': tmp = mem\_read(read\_hex(buffer + 1));
  printf("_{\parallel \parallel} m [\%s] = \%08x\%08x \ , buffer + 1, tmp.h, tmp.l); continue;
       The register stack pointers, rO and rS, are not kept up to date in the g array.
Therefore we have to deduce their values by examining the pipeline.
\langle Cases for interaction 14\rangle + \equiv
case 'g': if (sscanf(buffer + 1, "%d", \&n) \neq 1 \lor n < 0) goto what\_say:
  if (n > 256) goto what_say:
  if (n \equiv rO \lor n \equiv rS) {
     if (hot \equiv cool)
                          /* pipeline empty */
        q[rO].o = sl3(cool\_O), q[rS].o = sl3(cool\_S);
     else q[rO].o = sl3(hot \neg cur\_O), q[rS].o = sl3(hot \neg cur\_S);
  printf("_{1}, g[%d] = %08x\%08x\n", n, q[n].o.h, q[n].o.l);
  continue:
20. \langle Subroutines 10\rangle + \equiv
  static octa sl3 ARGS((octa));
  static octa sl3(y)
                            /* shift left by 3 bits */
        octa y:
     register tetra yhl = y.h \ll 3, ylh = y.l \gg 29;
     y.h = yhl + ylh; y.l \ll = 3;
     return y;
```

```
ARGS = macro (), MMIX-PIPE §6.
BUF_SIZE = 100, §5.
buffer: char [], §5.
cool: control *, MMIX-PIPE §60.
cool_O: octa, MMIX-PIPE §98.
cool_S: octa, MMIX-PIPE §98.
cur_O: octa, MMIX-PIPE §44.
cur_S: octa, MMIX-PIPE §44.
cur_S: octa, MMIX-PIPE §60.
g: int, MMIX-PIPE §167.
h: tetra, MMIX-PIPE §17.
head: fetch *, MMIX-PIPE §69.
hot: control *, MMIX-PIPE §60.
inst_ptr: spec, MMIX-PIPE §284.
```

```
l: tetra, MMIX-PIPE §17.
loc: octa, MMIX-PIPE §44.
lring_size: int, MMIX-PIPE §86.
mem_read: octa (),
MMIX-PIPE §210.

MMIX-PIPE §10.
o: octa, MMIX-PIPE §40.
octa = struct, MMIX-PIPE §40.
p: specnode *, MMIX-PIPE §40.
P_BIT = 1 & 0, MMIX-PIPE §54.
printf: int (), <stdio.h>.
reorder_bot: control *,
```

MMIX-PIPE §60.

reorder\_top: control \*,

MMIX-PIPE §60.

rK = 15, MMIX-PIPE §52.
rO = 10, MMIX-PIPE §52.
rS = 11, MMIX-PIPE §52.
sscanf: int (), <stdio.h>.
tetra = unsigned int,

MMIX-PIPE §17.
ticks: Extern octa,

MMIX-PIPE §87.
verbose: int, MMIX-PIPE §4.
what\_say: label, §13.

```
\langle \text{ Cases for interaction } 14 \rangle + \equiv
case 'I': print\_cache(buffer[1] \equiv T' ? ITcache : Icache, false); continue;
case 'D': print\_cache(buffer[1] \equiv 'T') ? DTcache : Dcache,
   buffer[1] \equiv """; continue;
case 'S': print\_cache(Scache, buffer[1] \equiv '*'); continue;
case 'p': print_pipe(); print_locks(); continue;
case 's': print_stats(); continue;
case 'i': if (sscanf(buffer + 1, \text{"%d"}, \&n) \equiv 1) g[rI].o = incr(zero\_octa, n);
  continue:
      \langle \text{ Cases for interaction } 14 \rangle + \equiv
case 'f': tmp = read\_hex(buffer + 1);
  {
     register fetch *new_tail;
     if (tail \equiv fetch\_bot) new\_tail = fetch\_top;
     else new\_tail = tail - 1;
     if (new\_tail \equiv head) printf("Sorry, \_the\_fetch\_buffer\_is\_full!\n");
     else {
        tail \rightarrow loc = inst\_ptr.o;
        tail \rightarrow inst = tmp.l;
        tail \rightarrow interrupt = 0;
        tail \rightarrow noted = false;
        tail = new\_tail;
     continue:
   }
       A hidden case here, for me when debugging. It essentially disables the transla-
tion caches, by mapping everything to zero.
\langle Cases for interaction 14\rangle + \equiv
case 'd': if (ticks.l)
     printf("Sorry: ||I||disable||ITcache||and||DTcache||only||at||the||beginning!\n");
  else {
     ITcache \rightarrow set[0][0].tag = zero\_octa;
     ITcache \rightarrow set[0][0].data[0] = seven\_octa;
     DTcache \rightarrow set[0][0].tag = zero\_octa;
     DTcache \rightarrow set[0][0].data[0] = seven\_octa;
     g[rK].o = neg\_one;
     page\_bad = false;
     page\_mask = neg\_one;
     inst\_ptr.p = (\mathbf{specnode} *) 1;
   } continue;
```

507 MMMIX: INTERACTION

**24.** And another case, for me when kludging. At the moment, it simply lists the functional unit names.

```
But I might decide to put other stuff here when giving a demo.
\langle Cases for interaction 14\rangle + \equiv
case '!': {
     register int j;
     for (j = 0; j < funit\_count; j ++) printf("unit_\%s_\%d\n", funit[j].name, funit[j].k);
  continue:
25. \langle \text{Global variables } 5 \rangle + \equiv
  bool bad_address;
  extern bool page_bad;
  extern octa page_mask;
  extern int page_r, page_s, page_b[5];
  extern octa zero_octa;
  extern octa neq_one;
  octa seven\_octa = \{0, 7\};
  extern octa incr ARGS((octa y, int delta));
                                                      /* unsigned y + \delta (\delta is signed) */
  extern void mmix_io_init ARGS((void));
  extern void MMIX_config ARGS((char *));
```

```
ARGS = macro (), MMIX-PIPE §6.
bool = \mathbf{enum}, MMIX-PIPE §11.
buffer: char [], §5.
data: octa *, MMIX-PIPE §167.
Dcache: cache *,
  MMIX-PIPE §168.
DTcache: cache *,
  MMIX-PIPE §168.
false = 0, mmix-pipe §11.
fetch = struct, MMIX-PIPE §68.
fetch\_bot: fetch *,
  MMIX-PIPE §69.
fetch\_top: \mathbf{fetch} *,
  MMIX-PIPE §69.
funit: func *, MMIX-PIPE §77.
funit_count: int,
  MMIX-PIPE §77.
g: \mathbf{int}, \text{MMIX-PIPE } 167.
head: fetch *, MMIX-PIPE §69.
Icache: cache *,
  MMIX-PIPE §168.
incr: octa (), MMIX-ARITH §6.
inst: tetra, MMIX-PIPE §68.
inst\_ptr: \mathbf{spec}, \text{MMIX-PIPE } \S 284.
interrupt: unsigned int,
```

```
ITcache: cache *,
  MMIX-PIPE §168.
k: register int, §17.
l: tetra, MMIX-PIPE §17.
loc: octa, mmix-pipe §44.
MMIX\_config: void (),
  MMIX-CONFIG §38.
mmix_io_init: void (),
 MMIX-IO §7.
n: int, §16.
name: char *, MMIX-PIPE §167.
neg_one: octa, MMIX-ARITH §4.
noted: bool, MMIX-PIPE §68.
o: octa, MMIX-PIPE §40.
octa = struct, MMIX-PIPE §17.
p: specnode *, MMIX-PIPE §40.
page_b: int [], MMIX-PIPE §238.
page_bad: bool,
  MMIX-PIPE §238.
page_mask: octa.
  MMIX-PIPE §238.
page_r: int, MMIX-PIPE §238.
page_s: int, MMIX-PIPE §238.
print_cache: void (),
```

MMIX-PIPE §68.

```
MMIX-PIPE §176.
print_locks: void (),
 MMIX-PIPE §39.
print_pipe: void (),
 MMIX-PIPE §253.
print_stats: void (),
 MMIX-PIPE §162.
printf: int (), <stdio.h>.
read_hex: octa (), §17.
rI = 12, mmix-pipe §52.
rK = 15, mmix-pipe §52.
Scache: \mathbf{cache} *,
 MMIX-PIPE §168.
set: cacheset *.
 MMIX-PIPE §167.
specnode = struct.
  MMIX-PIPE \S40.
sscanf: int (), <stdio.h>.
taq: octa, MMIX-PIPE §167.
tail: fetch *, MMIX-PIPE §69.
ticks: Extern octa.
 MMIX-PIPE §87.
tmp: octa, §16.
zero_octa: octa,
 MMIX-ARITH §4.
```

## 26. Names of the sections.

```
\langle Cases for interaction 14, 15, 18, 19, 21, 22, 23, 24\rangle Used in section 13. \langle Change the current location 7\rangle Used in section 6. \langle Global variables 5, 16, 25\rangle Used in section 2. \langle Input a rudimentary hexadecimal file 6\rangle Used in section 4. \langle Input an MMIX binary file 9\rangle Used in section 4. \langle Input consecutive octabytes beginning at cur\_loc 11\rangle Used in section 9. \langle Input the program 4\rangle Used in section 2. \langle Parse the command line 3\rangle Used in section 2. \langle Read an octabyte and advance cur\_loc 8\rangle Used in section 6. \langle Run the simulation interactively 13\rangle Used in section 9. \langle Set up the canned environment 12\rangle Used in section 9. \langle Subroutines 10, 17, 20\rangle Used in section 2.
```

1. Introduction. This program reads a binary mmo file output by the MMIXAL processor and lists it in human-readable form. It lists only the symbol table, if invoked with the -s option. It lists also the tetrabytes of input, if invoked with the -v option.

```
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
   (Prototype preparations 5)
   (Type definitions 7)
   (Global variables 4)
  (Subroutines 8)
  int main(argc, argv)
        int argc; char *argv[];
     register int j, delta, postamble = 0;
     register char *p;
     \langle \text{Process the command line } 2 \rangle;
     ⟨Initialize everything 3⟩;
     (List the preamble 23);
     do \langle List the next item 13\rangle while (\neg postamble);
     \langle List the postamble 24\rangle:
     \langle \text{List the symbol table 25} \rangle;
     return 0:
  }
2.
    \langle \text{Process the command line 2} \rangle \equiv
  listing = 1, verbose = 0;
  for (j = 1; j < arqc - 1 \land arqv[j][0] \equiv '-' \land arqv[j][2] \equiv ' \land 0'; j++)
     if (argv[j][1] \equiv 's') listing = 0;
     else if (argv[j][1] \equiv 'v') verbose = 1;
     else break;
  if (j \neq argc - 1) {
     exit(-1);
This code is used in section 1.
3. \langle \text{Initialize everything } 3 \rangle \equiv
  mmo\_file = fopen(argv[argc - 1], "rb");
  if (\neg mmo\_file) {
     fprintf(stderr, "Can't_lopen_lfile_l%s! \n", argv[argc-1]);
     exit(-2);
  }
See also sections 12 and 17.
This code is used in section 1.
```

fprintf: int (), <stdio.h>.

stderr: FILE \*, <stdio.h>.

```
4. \langle Global variables 4\rangle \equiv
                   /* are we listing everything? */
  int listing;
  int verbose:
                    /* are we also showing the tetras of input as they are read? */
                           /* the input file */
  FILE *mmo_file:
See also sections 11, 16, and 29.
This code is used in section 1.
     \langle Prototype preparations 5 \rangle \equiv
#ifdef __STDC__
#define ARGS(list) list
#else
\#define ARGS(list) ()
#endif
This code is used in section 1.
```

**6.** A complete definition of mmo format appears in the MMIXAL document. Here we need to define only the basic constants used for interpretation.

```
#define mm #98
                       /* the escape code of mmo format */
#define lop_quote
                           /* the quotation lopcode */
#define lop_loc
                 #1
                         /* the location lopcode */
#define lop_skip
                         /* the skip lopcode */
#define lop_fixo
                         /* the octabyte-fix lopcode */
                  #4
#define lop_fixr
                         /* the relative-fix lopcode */
#define lop_fixrx
                          /* extended relative-fix lopcode */
#define lop_file
                         /* the file name lopcode */
#define lop_line
                         /* the file position lopcode */
                         /* the special hook lopcode */
#define lop_spec
                         /* the preamble lopcode */
#define lop_pre
                         /* the postamble lopcode */
#define lop_post
#define lop_stab
                  #b
                         /* the symbol table lopcode */
#define lop_end
                         /* the end-it-all lopcode */
```

FILE, <stdio.h>.

fopen: **FILE** \*(), <stdio.h>.

7. Low-level arithmetic. This program is intended to work correctly whenever an int has at least 32 bits.

```
⟨Type definitions 7⟩ ≡
  typedef unsigned char byte;  /* a monobyte */
  typedef unsigned int tetra;  /* a tetrabyte */
  typedef struct { tetra h, l;
  } octa;  /* an octabyte */
This code is used in section 1.

8. The incr subroutine adds a signed integer to an (unsigned) octabyte.
⟨Subroutines 8⟩ ≡
  octa incr ARGS((octa,int));
```

```
Subroutines s \rangle \equiv octa incr ARGS((octa, int)); octa incr(o, delta) octa o; int delta; {
    register tetra t; octa x;
    if (delta \geq 0) {
        t = {}^\# ffffffff - delta;
        if (o.l \leq t) x.l = o.l + delta, x.h = o.h;
        else x.l = o.l - t - 1, x.h = o.h + 1;
    }
    else {
        t = -delta;
        if (o.l \geq t) x.l = o.l - t, x.h = o.h;
        else x.l = o.l + ({}^\# ffffffff + delta) + 1, x.h = o.h - 1;
    }
    return x;
}
```

See also sections 9, 10, and 26.

This code is used in section 1.

9. Low-level input. The tetrabytes of an mmo file are stored in friendly bigendian fashion, but this program is supposed to work also on computers that are little-endian. Therefore we read four successive bytes and pack them into a tetrabyte, instead of reading a single tetrabyte.

```
\langle Subroutines 8 \rangle + \equiv
  void read_tet ARGS((void));
  void read_tet()
     if (fread(buf, 1, 4, mmo\_file) \neq 4) {
        fprintf(stderr, "Unexpected, end, of, file, after, %d, tetras! \n", count);
        exit(-3);
     }
     yz = (buf[2] \ll 8) + buf[3];
     tet = (((buf[0] \ll 8) + buf[1]) \ll 16) + yz;
      \textbf{if} \ (\textit{verbose}) \ \textit{printf} ("$$ "$$ ".48 \ ", tet"); \\
     count ++;
  }
10. \langle Subroutines 8 \rangle + \equiv
  byte read_byte ARGS((void));
  byte read_byte()
     register byte b;
     if (\neg byte\_count) read_tet();
     b = buf[byte\_count];
     byte\_count = (byte\_count + 1) \& 3;
     return b;
  }
11. \langle \text{Global variables 4} \rangle + \equiv
  int count:
                   /* the number of tetrabytes we've read */
                         /* index of the next-to-be-read byte */
  int byte_count;
  byte buf[4];
                      /* the most recently read bytes */
                /* the two least significant bytes */
  tetra tet:
                   /* buf bytes packed big-endianwise */
12. (Initialize everything 3) +\equiv
  count = byte\_count = 0;
```

13. The main loop. Now for the bread-and-butter part of this program.

```
 \langle \text{List the next item } 13 \rangle \equiv \\ \{ \\ read\_tet(); \\ loop: \ \mathbf{if} \ (buf[0] \equiv mm) \\ \quad \mathbf{switch} \ (buf[1]) \ \{ \\ \quad \mathbf{case} \ lop\_quote: \ \mathbf{if} \ (yz \neq 1) \ err("YZ_field\_of\_lop\_quote\_should\_be\_1"); \\ \quad read\_tet(); \ \mathbf{break}; \\ \langle \text{Cases for lopcodes in the main loop } 18 \rangle \\ \quad \mathbf{default:} \ err("Unknown\_lopcode"); \\ \quad \} \\ \quad \mathbf{if} \ (listing) \ \langle \text{List } tet \text{ as a normal item } 15 \rangle; \\ \}
```

This code is used in section 1.

14. We want to catch all cases where the rules of mmo format are not obeyed. The *err* macro ameliorates this somewhat tedious chore.

```
#define err(m)
{ fprintf(stderr, "Error_in_itetra_i%d:_i%s!\n", count, m); continue; }
```

15. In a normal situation, the newly read tetrabyte is simply supposed to be loaded into the current location. We list not only the current location but also the current file position, if *cur\_line* is nonzero and *cur\_loc* belongs to segment 0.

```
\langle \text{List } tet \text{ as a normal item } 15 \rangle \equiv
   {
      printf("%08x%08x:, %08x", cur_loc.h, cur_loc.l, tet);
      if (\neg cur\_line) printf ("\n");
      else {
         if (cur_loc.h & #e0000000) printf("\n");
         else {
            if (cur\_file \equiv listed\_file) \ printf("_\( (line_\) \%d) \n", cur\_line);
               printf("_{\sqcup}(\"\s\",_{\sqcup}line_{\sqcup}\d)\n",file\_name[cur\_file],cur\_line);
               listed\_file = cur\_file;
         cur\_line++;
      cur\_loc = incr(cur\_loc, 4); cur\_loc.l \&= -4;
This code is used in section 13.
16. \langle \text{Global variables 4} \rangle + \equiv
                       /* the current location */
   octa cur_loc;
   int listed_file;
                         /* the most recently listed file number */
```

6. 〈Global variables 4〉 +=
octa cur\_loc; /\* the current location \*/
int listed\_file; /\* the most recently listed file number \*/
int cur\_file; /\* the most recently selected file number \*/
int cur\_line; /\* the current position in cur\_file \*/
char \*file\_name[256]; /\* file names seen \*/
octa tmp; /\* an octabyte of temporary interest \*/

17.  $\langle$  Initialize everything  $3 \rangle + \equiv$  $cur\_loc.h = cur\_loc.l = 0;$  $listed\_file = cur\_file = -1;$  $cur\_line = 0;$ 

buf: byte [], §11. *count*: **int**, §11. fprintf: int (), <stdio.h>. h: tetra, §7.

incr: octa (), §8.

l: tetra, §7. listing: int, §4.  $lop_{-}quote = {}^{\#}0, \S 6.$ mm =#98, §6.  $octa = struct, \S 7.$   $printf \colon \mathbf{int} \ (\ ), \ \ \ \ \ \ ).$ read\_tet: **void** (), §9.  $stderr \colon \mathbf{FILE} \ *, \ \mathsf{<stdio.h>}.$ tet: **tetra**, §11.

yz: int, §11.

18. The simple lopcodes. We have already implemented *lop\_quote*, which falls through to the normal case after reading an extra tetrabyte. Now let's consider the other lopcodes in turn.

```
#define y buf [2] /* the next-to-least significant byte */
#define z buf [3] /* the least significant byte */
⟨ Cases for lopcodes in the main loop 18⟩ ≡
case lop_loc: if (z ≡ 2) {
    j = y; read_tet(); cur_loc.h = (j ≪ 24) + tet;
} else if (z ≡ 1) cur_loc.h = y ≪ 24;
else err("Z_\text{dield_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\text{lof_\
```

19. Fixups load information out of order, when future references have been resolved. The current file name and line number are not considered relevant.

```
\langle Cases for lopcodes in the main loop 18\rangle + \equiv
case lop_fixo: if (z \equiv 2) {
     j = y; read_tet(); tmp.h = (j \ll 24) + tet;
  } else if (z \equiv 1) tmp.h = y \ll 24;
  else err("Z_{\sqcup}field_{\sqcup}of_{\sqcup}lop_fixo_{\sqcup}should_{\sqcup}be_{\sqcup}1_{\sqcup}or_{\sqcup}2");
  read\_tet(); tmp.l = tet;
  if (listing) printf("%08x%08x:_\_\%08x\%08x\n", tmp.h, tmp.l, cur\_loc.h, cur\_loc.l);
  continue:
case lop\_fixr: delta = yz;
  goto fixr;
case lop_fixrx: j = yz; if (j \neq 16 \land j \neq 24)
     err("YZ_field_of_lop_fixrx_should_be_16_or_24");
  read\_tet();
  delta = tet;
  if (delta & #fe000000) err("increment_of_lop_fixrx_is_too_large");
fixr: tmp = incr(cur\_loc, -(delta > *1000000)? (delta & *ffffff) - (1 \ll i) : delta) \ll 2);
  if (listing) printf("%08x%08x:1,%08x\n", tmp.h, tmp.l, delta);
  continue:
```

20. The space for file names isn't allocated until we are sure we need it.

```
 \begin{split} &\langle \operatorname{Cases} \text{ for lopcodes in the main loop } 18 \rangle + \equiv \\ &\operatorname{\mathbf{case}} \ lop\_\mathit{file} \colon \operatorname{\mathbf{if}} \ (\mathit{file\_name}[y]) \ \{ \\ &\operatorname{\mathbf{for}} \ (j=z; \ j>0; \ j--) \ \ \mathit{read\_tet}(\ ); \\ &\mathit{cur\_\mathit{file}} = y; \\ &\operatorname{\mathbf{if}} \ (z) \ \mathit{err}(\texttt{"Two}_{\square} \texttt{file}_{\square} \texttt{names}_{\square} \texttt{with}_{\square} \texttt{the}_{\square} \texttt{same}_{\square} \texttt{number}"); \\ &\} \ \operatorname{\mathbf{else}} \ \{ \\ &\operatorname{\mathbf{if}} \ (\neg z) \ \mathit{err}(\texttt{"No}_{\square} \texttt{name}_{\square} \texttt{given}_{\square} \texttt{for}_{\square} \texttt{newly}_{\square} \texttt{selected}_{\square} \texttt{file}"); \\ &\mathit{file\_name}[y] = (\operatorname{\mathbf{char}} *) \ \mathit{calloc}(4*z+1,1); \end{split}
```

```
 \begin{array}{l} \textbf{if } (\neg \mathit{file\_name}[y]) \ \{ \\ fprintf (\mathit{stderr}, ``\texttt{No}_{\square} \texttt{room}_{\square} \texttt{to}_{\square} \texttt{store}_{\square} \texttt{the}_{\square} \texttt{file}_{\square} \texttt{name}! \texttt{`n"}); \ \mathit{exit}(-4); \\ \} \\ \mathit{cur\_file} = y; \\ \textbf{for } (j = z, p = \mathit{file\_name}[y]; \ j > 0; \ j - -, p + = 4) \ \{ \\ \mathit{read\_tet}(); \\ *p = \mathit{buf}[0]; \ *(p + 1) = \mathit{buf}[1]; \ *(p + 2) = \mathit{buf}[2]; \ *(p + 3) = \mathit{buf}[3]; \\ \} \\ \} \\ \mathit{cur\_line} = 0; \ \textbf{continue}; \\ \textbf{case } \mathit{lop\_line} \colon \ \textbf{if } (\mathit{cur\_file} < 0) \ \mathit{err}("\texttt{No}_{\square} \texttt{file}_{\square} \texttt{was}_{\square} \texttt{selected}_{\square} \texttt{for}_{\square} \texttt{lop\_line}"); \\ \mathit{cur\_line} = \mathit{yz}; \ \textbf{continue}; \\ \end{aligned}
```

21. Special bytes in the file might be in synch with the current location and/or the current file position, so we list those parameters too.

```
lop_{\bullet}quote = {}^{\#}0, \ \S6.
buf: byte [], §11.
                                        j: register int, §1.
calloc: void *(), <stdlib.h>.
                                        l: tetra, §7.
                                                                                lop\_skip = {}^{\#}2, \S 6.
                                        listed_file: int, §16.
                                                                                lop\_spec = {}^{\#}8, \S 6.
cur_file: int, §16.
                                        listing: int, §4.
cur_line: int, §16.
                                                                                mm = {}^{\#}98, \S 6.
cur\_loc : \mathbf{octa}, \S 16.
                                        loop: label, §13.
                                                                                p: register char *, §1.
                                        lop_{file} = 46, \S 6.
                                                                                printf: int (), <stdio.h>.
delta: int, §8.
err = macro(), \S 14.
                                        lop_fixo = #3, \S 6.
                                                                               read\_tet: void (), §9.
                                                                               stderr: FILE *, <stdio.h>.
exit: void (), <stdlib.h>.
                                        lop_fixr = #4, \S6.
                                        lop_fixrx = #5, \S6.
file_name: char *[], §16.
                                                                               tet: tetra, §11.
                                        lop_{\bullet}line = #7, \S 6.
fprintf: int (), <stdio.h>.
                                                                               tmp: octa, §16.
                                        lop\_loc = #1, \S 6.
h: tetra, §7.
                                                                                yz: \mathbf{int}, \S 11.
incr: octa (), §8.
```

## 22. The other cases shouldn't appear in the main loop.

```
 \begin{split} &\langle \operatorname{Cases} \text{ for lopcodes in the main loop } 18 \rangle +\equiv \\ &\operatorname{case} \ lop\_pre: \ err(\text{"Can't}\_have\_another\_preamble"); \\ &\operatorname{case} \ lop\_post: \ postamble = 1; \\ &\operatorname{if} \ (y) \ err(\text{"Y}\_field\_of\_lop\_post\_should\_be\_zero"); \\ &\operatorname{if} \ (z < 32) \ err(\text{"Z}\_field\_of\_lop\_post\_must\_be\_32\_or\_more"); \\ &\operatorname{continue}; \\ &\operatorname{case} \ lop\_stab: \ err(\text{"Symbol}\_table\_must\_follow\_postamble"); \\ &\operatorname{case} \ lop\_end: \ err(\text{"Symbol}\_table\_can't\_end\_before\_iit\_begins"); \\ \end{aligned}
```

**23.** The preamble and postamble. Now here's what we do before and after the main loop.

```
\langle \text{ List the preamble 23} \rangle \equiv
                  /* read the first tetrabyte of input */
  read\_tet();
  if (buf[0] \neq mm \lor buf[1] \neq lop\_pre) {
     fprintf(stderr, "Input_is_not_an_MMO_file_(first_two_bytes_are_wrong)!\n");
     exit(-5);
  }
  if (y \neq 1) fprintf (stderr,
          "Warning: ||I'm||reading||this||file||as||version||1,||not||version||%d!\n",y);
  if (z > 0) {
     j=z;
     read_tet();
     if (listing) printf("File_was_created_%s", asctime(localtime((time_t *) & tet)));
     for (j--; j>0; j--) {
        read_tet();
        if (listing) printf("Preamble_idata_i%08x\n", tet);
     }
  }
This code is used in section 1.
24. \langle List the postamble 24\rangle \equiv
  for (j = z; j < 256; j ++) {
     read\_tet(); tmp.h = tet; read\_tet();
     if (listing) {
        if (tmp.h \lor tet) printf("g%03d: \_\%08x\\08x\\n\", j, tmp.h, tet);
        else printf("g\%03d: 0\n", j);
     }
  }
```

This code is used in section 1.

```
asctime: char *(), <time.h>.
                                     localtime: struct tm *(),
                                                                           printf: int (), <stdio.h>.
buf: byte [], §11.
                                                                           read\_tet: void (), §9.
                                       <time.h>.
err = macro(), \S 14.
                                     lop\_end = \#c, \S 6.
                                                                           stderr: FILE *, <stdio.h>.
exit: void (), <stdlib.h>.
                                     lop\_post = \#a, \S6.
                                                                           tet: tetra, §11.
                                     lop\_pre = #9, \S6.
fprintf: int (), <stdio.h>.
                                                                           tmp: octa, §16.
                                     lop_{-}stab = {}^{\#}b, \S 6.
h: tetra, §7.
                                                                           y = \text{macro}, \S 18.
                                     mm = {}^{\#}98, \S 6.
j: register int, §1.
                                                                           z = \text{macro}, \S 18.
listing: int, §4.
                                     postamble: register int, §1.
```

**25.** The symbol table. Finally we come to the symbol table, which is the most interesting part of this program because it recursively traces an implicit ternary trie structure.

```
 \langle \text{List the symbol table } 25 \rangle \equiv read\_tet(); \\ \text{if } (buf[0] \neq mm \lor buf[1] \neq lop\_stab) \ \{ \\ fprintf(stderr, "Symbol\_table\_does\_not\_follow\_the\_postamble! \n"); \\ exit(-6); \\ \} \\ \text{if } (yz) \ fprintf(stderr, "YZ\_field\_of\_lop\_stab\_should\_be\_zero! \n"); \\ printf("Symbol\_table\_(beginning\_at\_tetra\_%d): \n", count); \\ stab\_start = count; \\ sym\_ptr = sym\_buf; \\ print\_stab(); \\ \langle \text{Check the } lop\_end \ 30 \rangle; \\ \text{This code is used in section 1.}
```

**26.** The main work is done by a recursive subroutine called *print\_stab*, which manipulates a global array *sym\_buf* containing the current symbol prefix; the global

variable *sym\_ptr* points to the first unfilled character of that array.

```
\langle \text{Subroutines } 8 \rangle + \equiv
  void print_stab ARGS((void));
  void print_stab()
     register int m = read\_byte();
                                        /* the master control byte */
     register int c;
                          /* the character at the current trie node */
     register int j, k;
     if (m & #40) print_stab();
                                      /* traverse the left subtrie, if it is nonempty */
     if (m & #2f) {
        \langle \text{ Read the character } c \text{ 27} \rangle;
        *sym\_ptr ++ = c;
       if (sym\_ptr \equiv \&sym\_buf[sym\_length\_max]) {
          fprintf(stderr, "Oops, \_the\_symbol\_is\_too\_long!\n"); exit(-7);
        if (m \& {}^{\#}\mathbf{f}) (Print the current symbol with its equivalent and serial number 28);
       if (m & #20) print_stab();
                                         /* traverse the middle subtrie */
        sym_ptr --;
     if (m & #10) print_stab();
                                    /* traverse the right subtrie, if it is nonempty */
```

27. The present implementation doesn't support Unicode; characters with more than 8-bit codes are printed as '?'. However, the changes for 16-bit codes would be quite easy if proper fonts for Unicode output were available. In that case, *sym\_buf* would be an array of wyde characters.

```
\langle Read the character c_{27}\rangle \equiv
  if (m \& #80) j = read_byte(); /* 16-bit character */
  else j=0;
  c = read\_byte();
  if (i) c = ??:
                        /* oops, we can't print (i \ll 8) + c easily at this time */
This code is used in section 26.
      \langle Print the current symbol with its equivalent and serial number 28\rangle \equiv
28.
     *sym\_ptr = '\0';
     i = m \& \# f:
     if (j \equiv 15) sprintf (equiv_buf, "$%03d", read_byte());
     else if (i < 8) {
        strcpy(equiv_buf, "#");
        for (; j > 0; j--) sprintf (equiv\_buf + strlen(equiv\_buf), "%02x", read\_byte());
        if (strcmp(equiv\_buf, "#0000") \equiv 0) strcpy(equiv\_buf, "?"); /* undefined */
        strncpy(equiv\_buf, "#200000000000", 33 - 2 * j);
        equiv\_buf[33 - 2 * j] = '\0';
        \textbf{for (};\;j>8;\;j--)\;\;sprintf(\textit{equiv\_buf}\;+\;strlen(\textit{equiv\_buf}\;),\texttt{"\%02x"},\textit{read\_byte()});
     for (j = k = read\_byte(); ; k = read\_byte(), j = (j \ll 7) + k)
        if (k > 128) break;
                                  /* the serial number is now j - 128 */
     printf("_{ \sqcup \sqcup \sqcup \sqcup \sqcup} \%s_{ \sqcup} = \%s_{ \sqcup} (\%d) \n", sym_buf + 1, equiv_buf, j - 128);
This code is used in section 26.
29.
       #define sym_length_max 1000
\langle Global variables 4\rangle + \equiv
                      /* where the symbol table began */
  int stab_start;
  char sym\_buf[sym\_length\_max];
     /* the characters on middle transitions to current node */
                       /* the character in sym_buf following the current prefix */
  char equiv\_buf[20];
                             /* equivalent of the current symbol */
```

```
ARGS = macro (), \S 5.
                                     lop\_stab = {}^{\#}b, \S 6.
                                                                           stderr: FILE *, <stdio.h>.
buf: byte [], §11.
                                     mm = #98, §6.
                                                                           strcmp: int (), <string.h>.
                                     printf \colon \mathbf{int} \ (\ ), \ \ \ \ \ \ ).
count: int, §11.
                                                                           strcpy: char *(), <string.h>.
exit: void (), <stdlib.h>.
                                     read_byte: byte (), §10.
                                                                           strlen: size_t (), <string.h>.
fprintf: int (), <stdio.h>.
                                     read_tet: void (), §9.
                                                                           strncpy: char *(), <string.h>.
lop\_end = \#c, \S6.
                                     sprintf: int (), <stdio.h>.
                                                                           yz: \mathbf{int}, \S 11.
```

```
30. \langle \text{Check the } lop\_end \ 30 \rangle \equiv \\ \text{while } (byte\_count) \\ \text{if } (read\_byte()) \ fprintf(stderr, "Nonzero\_byte\_follows\_the\_symbol\_table!\n"); read\_tet(); \\ \text{if } (buf[0] \neq mm \lor buf[1] \neq lop\_end) \\ fprintf(stderr, "The\_symbol\_table\_isn't\_followed\_by\_lop\_end!\n"); \\ \text{else if } (count \neq stab\_start + yz + 1) \\ fprintf(stderr, "YZ\_field\_at\_lop\_end\_should\_have\_been\_%d!\n", count - yz - 1); \\ \text{else } \{ \\ \text{if } (verbose) \ printf("Symbol\_table\_ends\_at\_tetra\_%d.\n", count); \\ \text{if } (fread(buf, 1, 1, mmo\_file)) \\ fprintf(stderr, "Extra\_bytes\_follow\_the\_lop\_end!\n"); \\ \} \\ \text{This code is used in section 25.} \\ \\
```

## 31. Names of the sections.

```
⟨Cases for lopcodes in the main loop 18, 19, 20, 21, 22⟩ Used in section 13.
⟨Check the lop_end 30⟩ Used in section 25.
⟨Global variables 4, 11, 16, 29⟩ Used in section 1.
⟨Initialize everything 3, 12, 17⟩ Used in section 1.
⟨List tet as a normal item 15⟩ Used in section 13.
⟨List the next item 13⟩ Used in section 1.
⟨List the postamble 24⟩ Used in section 1.
⟨List the preamble 23⟩ Used in section 1.
⟨List the symbol table 25⟩ Used in section 1.
⟨Print the current symbol with its equivalent and serial number 28⟩ Used in section 26.
⟨Process the command line 2⟩ Used in section 1.
⟨Prototype preparations 5⟩ Used in section 1.
⟨Read the character c 27⟩ Used in section 26.
⟨Subroutines 8, 9, 10, 26⟩ Used in section 1.
⟨Type definitions 7⟩ Used in section 1.
```

```
buf: byte [], §11.
byte_count: int, §11.
count: int, §11.
fprintf: int (), <stdio.h>.
fread: size_t (), <stdio.h>.
```

lop\_end = #c, §6.
mm = #98, §6.
mmo\_file: FILE \*, §4.
printf: int (), <stdio.h>.
read\_byte: byte (), §10.

read\_tet: void (), §9. stab\_start: int, §29. stderr: FILE \*, <stdio.h>. verbose: int, §4. yz: int, §11.

The following list, a compilation of the indexes produced from all the MMIXware programs and documentation, shows the section numbers where each identifier makes an appearance. Underlined numbers indicate a place of definition. Single-letter identifiers are indexed only when they are defined.

Further characteristics of the program segments, such as 'system dependencies', can also be found here, together with significant error messages and other indexable things like the names of people whose work is cited.

Digits follow letters in the lexicographic order of this index. For example, 't1' follows 'tt'; and '16ADDU' precedes '2ADDU'.

```
??: MMIX-PIPE 25.
                                                     align_bits: MMIXAL 62, 102, 107.
%: MMIX-CONFIG 18.
                                                     alloc_cache: MMIX-CONFIG 31, 35.
__STDC__: MMIX-ARITH 2, MMIX-IO 2, MMIX-
                                                     alloc_slot: MMIX-PIPE 204, 205, 218, 222, 225,
                                                         261, 272, 274, 276, 298, 300, 326.
    PIPE 6, MMIX-SIM 11, MMIXAL 31,
    MMOTYPE 5.
                                                     Alpha computers: MMIX 45, MMIX-PIPE 217.
a: MMIX-ARITH 28, 29, 59, MMIX-PIPE 44, 91,
                                                     alt_name: MMIX-SIM 24.
    167, 381, 384, MMIX-SIM 61, 114, 117.
                                                     AND: MMIX 10, MMIX-PIPE 47, MMIX-SIM 54,
aa: MMIX-CONFIG 16, 23, 31, 32, MMIX-
                                                         86, mmixal 63.
    \texttt{PIPE}\ \underline{167},\ 177,\ 181,\ 186,\ \underline{187},\ \underline{189},\ \underline{191},\ 193,
                                                     and: MMIX-CONFIG 28, MMIX-PIPE 49, 51,
    196, 199, 205, 233, 234.
                                                         138, mmixal 82, 97, 101.
aaaaa: mmix-pipe 237, 243, 244.
                                                     Anderson, Jennifer-Ann Monique: MMIX 40.
abort: MMIX-IO 8.
                                                     ANDI: MMIX-PIPE 47, MMIX-SIM 54, 86.
absolute value, floating point: MMIX 13.
                                                     ANDN: MMIX 10, MMIX-PIPE 47, MMIX-SIM 54,
ABSTIME: MMIX-PIPE 89, MMIX-SIM 77.
                                                         86, mmixal 63.
acc: MMIX-ARITH 8, 11, 12, 13, 19, MMIXAL 29,
                                                     andn: MMIX-CONFIG 28, MMIX-PIPE 49,
    83, 92, 93, 94, 95, 96, 107, 109, 126, 127, 131.
                                                         51, 138.
access_time: MMIX-CONFIG 16, 23, MMIX-
                                                     ANDNH: MMIX 13, MMIX-PIPE \underline{47}, MMIX-SIM \underline{54},
    PIPE 167, 217, 224, 230, 233, 234, 257, 261,
                                                         86, mmixal 63.
    262, 266, 267, 268, 270, 271, 272, 273, 274,
                                                     ANDNI: MMIX-PIPE <u>47</u>, MMIX-SIM <u>54</u>, 86.
    288, 291, 292, 295, 296, 300, 326, 353, 354,
                                                     ANDNL: MMIX 13, MMIX-PIPE 47, MMIX-SIM 54,
    358, 359, 360, 364, 365, 366.
                                                         86, MMIXAL 63.
acctm: MMIX-CONFIG 13, 15, 23.
                                                     ANDNMH: MMIX 13, MMIX-PIPE 47, MMIX-
accuracy loss: MMIX-ARITH 31.
                                                         SIM <u>54</u>, 86, MMIXAL 63.
ADD: MMIX 9, MMIX-PIPE \underline{47}, MMIX-SIM \underline{54},
                                                     ANDNML: MMIX 13, MMIX-PIPE 47, MMIX-
    84, mmixal 63.
                                                         SIM <u>54</u>, 86, MMIXAL 63.
add: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 140.
                                                     Aragon, Cecilia Rodriguez: MMIX-SIM 16.
add\_go: MMIX-PIPE 331.
                                                     arg: MMIX-SIM 143, 145, 146.
ADDI: MMIX-PIPE <u>47</u>, MMIX-SIM <u>54</u>, 84.
                                                     arg_count: MMIX-PIPE 374, 380, MMIX-
addr: MMIX-IO \underline{4}, MMIX-MEM \underline{2}, \underline{3}, MMIX-
                                                         _{\text{SIM}} \ \underline{110}, \ 111.
    PIPE 40, 43, 44, 73, 89, 95, 100, 115, 116,
                                                     arg_loc: MMIX-PIPE 380
    argc: MMIX-SIM 37, 141, 142, 163, MMIXAL 136,
                                                         137, MMMIX 2, 3, MMOTYPE 1, 2, 3.
    261, 262, 281, <u>297</u>, 356, <u>378</u>, <u>379</u>, <u>381</u>, <u>384</u>,
                                                     ARGS: MMIX-ARITH \underline{2}, MMIX-IO \underline{2}, MMIX-
    MMIX-SIM 20, 114, 117.
                                                         PIPE \underline{6}, MMIX-SIM \underline{11}, MMIXAL \underline{31},
addr_found: MMIX-PIPE 256.
                                                         MMOTYPE 5.
                                                     argv: MMIX-SIM 141, 142, 144, MMIXAL 136,
ADDU: MMIX 7, 9, MMIX-PIPE \underline{47}, MMIX-SIM \underline{54},
    85, mmixal 63.
                                                         137, MMMIX \overline{2}, \overline{3}, MMOTYPE \underline{1}, 2, 3.
addu: MMIX-CONFIG 28, MMIX-PIPE 49,
                                                     arith_exc: MMIX-PIPE 44, 46, 59, 98, 100,
    51, 139.
                                                         146, 307, 308.
                                                     ASCII: MMIX 6.
ADDUI: MMIX-PIPE <u>47</u>, MMIX-SIM <u>54</u>, 85, 131.
Advanced Micro Devices: MMIX 42.
                                                     asctime: MMOTYPE 23.
after: MMIX-PIPE 282.
                                                     assemble: MMIXAL <u>52</u>, 117, 119, 128.
alf: MMIX-PIPE 192, 193, 195, 205.
                                                     assemble_inst: MMIXAL 119, 129, 130, 131.
```

assemble\_X: MMIXAL 119, 124, 125, 126, 127. BinaryWrite: MMIX-SIM 4, MMIXAL 69. assembly language: MMIXAL 1. bit stuffing: MMIX 6, 7, 19. bit\_code\_map: MMIX-PIPE 54, 56. assoc: MMIX-CONFIG 13, 15, 23. AT&T Bell Laboratories: MMIX 42. bits: mmixal 62, 64. atomic instruction: MMIX 31. bkpt: MMIX-SIM 16, 58, 63, 82, 83, 161, 162. Attempt to get characters...: MMIXblksz: mmix-config 13, 15, 23. PIPE 381. block\_diff: MMIX-PIPE 217, 219. BN: MMIX 17, MMIX-PIPE 47, MMIX-SIM 54, Attempt to put characters...: MMIX-PIPE 384. 93, mmixal 63. aux: MMIX-ARITH 8, 9, 11, 12, 13, 14, 19, 24, BNB: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 93. 43, 45, MMIX-PIPE  $\overline{20}$ , 21, 343, MMIX-SIM 13, BNN: MMIX 17, MMIX-PIPE 47, MMIX-SIM 54,  $37, 88, 155, 159, \text{mmixal } \underline{27}, 28, 101.$ 93, MMIXAL 63. avoid\_D: MMIX-PIPE 273,  $\underline{277}$ . BNNB: MMIX-PIPE 47, MMIX-SIM 54, 93. awaken: MMIX-PIPE 125, 222, 224, 245. BNP: MMIX 17, MMIX-PIPE 47, MMIX-SIM 54, b: MMIX-ARITH 28, 29, 59, MMIX-PIPE 44, 56, 93, mmixal 63. 82, 157, 167, 172, MMIX-SIM 27, 61, 91, 160, BNPB: MMIX-PIPE 47, MMIX-SIM 54, 93. MMIXAL  $\underline{48}$ , MMOTYPE  $\underline{10}$ . BNZ: MMIX 17, MMIX-PIPE 47, MMIX-SIM 54, B\_BIT: MMIX-PIPE <u>54</u>, 118, 304, 323, 329, 93, MMIXAL 63. 330, 332, 336, 337. BNZB: MMIX-PIPE 47, MMIX-SIM 54, 93. backward\_local: MMIXAL 90, 91, 109. BOD: MMIX 17, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , backward\_local\_host: MMIXAL 89, 90, 91. 93, mmixal 63. Bad object file: MMIX-SIM 26. BODB: MMIX-PIPE 47, MMIX-SIM 54, 93. bad\_address: MMMIX 9, 11, 25. **bool**: MMIX-ARITH 1, 9, 29, 70, MMIX-PIPE 11, bad\_fetch: MMIX-PIPE 288, 293, 296, 298, 301. 12, 20, 21, 40, 44, 65, 66, 68, 75, 148, 169,170, 175, 176, 202, 203, 238, 242, 303, 315, bad\_quesses: MMIX-SIM 93, 139, 140. bad\_inst\_mask: MMIX-PIPE 304, 305, 323. MMIX-SIM 9, 13, 48, 52, 61, 129, 140, 143, bad\_resume: MMIX-PIPE 323 144, 150, 151, mmixal 26. bb: MMIX-CONFIG 16, 23, 30, 31, 32, 33, 35, 36, bool\_mult: MMIX-ARITH 29, MMIX-PIPE 21, 37, MMIX-PIPE 167, 170, 172, 179, 185, 193, 344, mmix-sim <u>13</u>, 87. 201, 203, 216, 217, 218, 219, 221, 223, 224, Boolean multiplication: MMIX 12. 226, 227, 228, 229, 259, 262, 265, 268, 271, borrow: MMIX-ARITH 62. 273, 275, 276, 280, 292, 294, 364, 378, 379. BP: MMIX 17, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , BDIF: MMIX 11, MMIX-PIPE 47, MMIX-SIM 54, 93, mmixal 63. 87, MMIXAL 63. bp: MMMIX 15, 16.bdif: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 344.  $bp_a$ : MMIX-CONFIG 15, 37, MMIX-PIPE 150, BDIFI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 87. 151, 152, 153. before: MMIX-PIPE 282. bp\_amask: MMIX-PIPE 151, 152, 153, <u>154</u>.  $bp_b$ : MMIX-CONFIG 15, 37, MMIX-PIPE 150, Bentley, Jon Louis: MMIXAL 54. 151, 152, 153. Berc, Lance Michael: MMIX 40. BEV: MMIX 17, MMIX-PIPE 47, MMIX-SIM 54, *bp\_bad\_stat*: MMIX-PIPE 154, 155, 162. 93, MMIXAL 63. bp\_bcmask: MMIX-PIPE 151, 152, 153, 154. BEVB: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 93. bp\_c: MMIX-CONFIG 15, 37, MMIX-PIPE 150, big-endian versus little-endian: MMIX 6, 12, mmix-io 16, mmix-pipe 304, mmixal 47,  $bp\_cmask$ : MMIX-PIPE 151, 152, 153, <u>154</u>. MMMIX 10. *bp\_qood\_stat*: MMIX-PIPE 154, 155, 162.  $bp_n$ : MMIX-CONFIG 15, 37, MMIX-PIPE 150, bignum: MMIX-ARITH 54, 59, 60, 61, 62, 66, 68, 81, 82. 153. bignum\_compare: MMIX-ARITH 54, 61, 64, bp\_nmask: mmix-pipe 152, 153, 154. *bp\_npower*: MMIX-PIPE 151, 152, 153, <u>154</u>, 160. bignum\_dec: MMIX-ARITH 54, 62, 65, 83.  $bp\_ok\_stat\colon \ \text{mmix-pipe } 152,\,\underline{154},\,162.$ bignum\_double: MMIX-ARITH 68, 82, 83.  $bp\_rev\_stat$ : MMIX-PIPE 152,  $\underline{154}$ , 162. bignum\_prec: MMIX-ARITH <u>59</u>, 62, 65, 83. bp\_table: MMIX-CONFIG 37, MMIX-PIPE 150, bignum\_times\_ten: MMIX-ARITH 54, 60, 151, 152, 160, 162. 64, 65, 82. BPB: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 93. binary files: MMMIX 9. br: MMIX-CONFIG 28, MMIX-PIPE  $\underline{49}$ , 51, binary-to-decimal conversion: MMIX-ARITH 54. 85, 106, 152, 155. binary\_check: MMIXAL 101. break\_inst: MMIX-SIM 107. BinaryRead: MMIX-SIM 4, MMIXAL 69. breakpoint: MMIX-PIPE 9, 10, 304, MMIX-BinaryReadWrite: MMIX-SIM 4, MMIXAL 69. SIM 61, 63, 82, 83, 93, 107, 109, 127,

128, 141, 149. 237, 257, 258, 378, 379. breakpoint\_hit: MMIX-PIPE 10, 12, 304. caches: MMIX 30, MMIX-PIPE 163. cacheset: MMIX-CONFIG 32, MMIX-PIPE 167, BSPEC: MMIXAL 20, 43, 62, 63, 132. buf: MMIX-ARITH 75, 76, 79, MMIX-IO 4, 12 186, 187, 188, 189, 190, 191, 193, 194,  $13, \underline{14}, 15, \underline{16}, 17, \underline{18}, \underline{19}, \underline{20}, \text{MMIX-MEM }\underline{1},$ 196, 205. 2, MMIX-PIPE 381, 384, MMIX-SIM 13, 25, calloc: MMIX-CONFIG 16, 18, 26, 31, 32, 33, 26, 27, 28, 29, 33, 35, 36, 45, 114, 117, 34, 36, 37, 38, MMIX-IO 12, MMIX-PIPE 213, MMIXAL 47, MMOTYPE 9,  $\overline{10}$ ,  $\overline{11}$ ,  $\overline{13}$ ,  $\overline{18}$ , MMIX-SIM 17, 24, 35, 41, 42, 77, MMIXAL 32, 38, 55, 59, 84, MMOTYPE 20. 20, 21, 23, 25, 30. can complement...: MMIXAL 100. buf\_max: MMIX-ARITH 73, 74, 75. buf\_pointer: MMIX-CONFIG 9, 10. can compute...: MMIXAL 101. can divide...: MMIXAL 101. buf\_ptr: mmixal 33, 34, 102, 136. BUF\_SIZE: MMIX-CONFIG 9, 10, MMMIX 5, can multiply...: MMIXAL 101. can negate...: MMIXAL 100. 6, 13, 16. can registerize...: MMIXAL 100. buf\_size: MMIX-SIM 40, 41, 42, 45, 143, can take serial number...: MMIXAL 100. MMIXAL 32, 34, 84, 137, 139. Can't allocate...: MMIX-CONFIG 16, 18, 31, buffer: MMIX-CONFIG 9, 10, MMIX-IO 12, 14, 16, 18, MMIX-SIM 4, 40, 41, 42, 45, 32, 33, 34, 36, 37, MMIX-SIM 17, 24, 41. Can't have another...: MMOTYPE 22. MMIXAL 32, 33, 34, 38, 41, MMMIX 5, 6, 7, 8, 13, 15, 18, 19, 21, 22. Can't open...: MMIX-CONFIG 38, MMIXbuf0: MMIX-ARITH 73, 74, 75, 79. SIM 24, MMIXAL 138, MMMIX 6, 9, bus\_words: MMIX-CONFIG 36, 37, MMIX-MMOTYPE 3. PIPE 214, 216, 219, 223, 297. Can't write...: MMIXAL 47. cannot add...: MMIXAL 99. bypass: MMIXAL 45,  $\underline{102}$ , 103, 104, 132. BYTE: MMIXAL 17, 62, 63, 117. cannot subtract...: MMIXAL 101. cannot use...: MMIXAL 102. byte: MMIX 6. Capacity exceeded...: MMIXAL 38, 55, 59. byte: MMIX-SIM 10, 25, 27, MMOTYPE 7, carry: MMIX-ARITH 60, 82 10, 11. carry-save addition: MMIX 40. byte\_count: MMIX-SIM 24, 25, 27, catchint: mmix-sim 147, 148. MMOTYPE 10, <u>11</u>, 12, 30. cc: mmix-config 16, 23, 31, 32, mmix $byte\_diff$ : MMIX-ARITH <u>27</u>, 28, MMIX-PIPE <u>21</u>,  $\texttt{PIPE}\ 158, \ \underline{159}, \ \underline{167}, \ 177, \ 181, \ 184, \ \underline{185}, \ \underline{222},$ 344, MMIX-SIM 13, 87. <u>224</u>, 233, <u>234</u>, <u>237</u>, 245, <u>357</u>. BZ: MMIX 17, MMIX-PIPE 47, MMIX-SIM 54, cease: MMIX-PIPE 10. 93, mmixal 63. ch: mmixal 54, 57, 61, 74, 75, 79. BZB: MMIX-PIPE 47, MMIX-SIM 54, 93. c: mmix-arith 29, mmix-config 16, 23, 31, Char: MMIX-SIM 39, 40, 41, MMIXAL 30, 32, 33, 37, 38, 40, 57, 62, 67, 68, 77. MMIX-PIPE 25, 28, 31, 33, 46, 159, 167, 170, 172, 174, 176, 179, 181, 183, 185, 193, char\_switch: MMIX-SIM 133, 134.  $\overline{196}$ ,  $\overline{199}$ ,  $\overline{201}$ ,  $\overline{203}$ ,  $\overline{205}$ ,  $\overline{215}$ ,  $\overline{217}$ ,  $\overline{222}$ ,  $\overline{224}$ , check\_ld: mmix-sim 94, 96. check\_st: mmix-sim 95. 237, 326, MMOTYPE 26. C preprocessor: MMIXAL 3. check\_syntax: MMIX-SIM 149. choose\_victim: MMIX-PIPE 186, 187, 196, 205. c\_param: MMIX-CONFIG 13. chunk: MMIX-CONFIG 37, MMIX-PIPE 206, 209, cache: MMIX-CONFIG 16, 23, 31, MMIX-210, 213, 216, 219, 223, 297, MMMIX 8, 11. PIPE <u>167</u>, 168, 169, 170, 171, 172, 173, chunknode: MMIX-CONFIG 37, MMIX-174, 175, 176, 178, 179, 180, 181, 182, PIPE <u>206</u>, 207. 183, 184, 185, 192, 193, 195, 196, 198,  $199,\ 200,\ 201,\ 202,\ 203,\ 204,\ 205,\ 215,$ citm: mmix-config <u>13</u>, 15, 23. 217, 222, 224, 237, 326. clean\_block: MMIX-PIPE 178, 179, 181, 276, cache\_addr: MMIX-PIPE 192, 193, 196, 201, 365, 366, 367. 205, 217. clean\_co: mmix-pipe 230, 231, 361, 363, 364, 368. cache\_search: MMIX-PIPE 192, 193, 195, 205, 206, 217, 224, 233, 234, 262, 267, 268, clean\_ctl: MMIX-PIPE 230, 231, 361, 368. clean\_lock: MMIX-PIPE 39, 230, 233, 234, 271, 272, 273, 291, 292, 296, 302, 353, 354, 365, 366, 367, 378, 379. 361, 368. cleanup: MMIX-PIPE 129, 230, 231, 232. cacheblock: MMIX-CONFIG 32, 33, MMIXclearerr: MMIX-IO 13. PIPE 167, 169, 170, 171, 172, 178, 179, 184, 185, 186, 187, 188, 189, 190, 191, Clock time is...: MMIX-PIPE 14. 192, 193, 195, 196, 198, 199, 200, 201, CMP: MMIX 15, MMIX-PIPE 47, MMIX-SIM 54, 202, 203, 204, 205, 217, 222, 224, 232, 90, mmixal 63.

cmp: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 143. copy\_out\_time: MMIX-CONFIG 16, 23, MMIXcmp\_fin: MMIX-PIPE 348, MMIX-SIM 90. PIPE 167, 203, 221, 233, 234, 259. cmp\_neq: MMIX-PIPE 143, 348, MMIX-SIM 90. coroutine: MMIX-CONFIG 26, 34, 36, MMIXcmp\_pos: mmix-pipe 143, 348, mmix-sim 90. PIPE 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 44, 76, 124, 127, 167, cmp\_zero: MMIX-PIPE 143, 348, MMIX-SIM 90. 222, 224, 230, 235, 237, 248, 285. cmp\_zero\_or\_invalid: MMIX-PIPE 348, MMIXcoroutine\_bit: MMIX-PIPE 8, 10, 125. SIM 90. CMPI: MMIX-PIPE 47, MMIX-SIM 54, 90. coroutine\_struct: MMIX-PIPE 23. cotm: mmix-config 13, 15, 23. CMPU: MMIX 9, 15, MMIX-PIPE 47, MMIXcount: MMIX-PIPE 216, 219, 223, MMOTYPE 9, SIM 54, 90, MMIXAL 63. cmpu: mmix-config 28, mmix-pipe 49,  $11, 12, 14, 25, \overline{30}$ . count\_bits: MMIX-ARITH 26, 28, MMIX-PIPE 21, 51, 143. 344, MMIX-SIM <u>13</u>, 87. CMPUI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 90. counting leading zeros: MMIX 28. co: MMIX-CONFIG 26, MMIX-PIPE 76, 81, 82, 237, 243, 244. counting ones: MMIX 12. counting trailing zeros: MMIX 37. code: MMIXAL 62, 64. CPV: MMIX-CONFIG 15, 16, 17, 23. command line arguments: MMIX-SIM 2, 6, 163. command\_buf: MMIX-SIM 149, 150, 151.  $CPV\_size$ : MMIX-CONFIG <u>15</u>, 17, 23. command\_buf\_size: MMIX-SIM 150, 151. cpv\_spec: MMIX-CONFIG 13, 15, 17. cset: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 345. commit\_max: MMIX-CONFIG 15, MMIX-PIPE 59, 67, 145, 330. CSEV: MMIX 16, MMIX-PIPE 47, MMIX-SIM 54, compare-and-swap: MMIX 31. 92, mmixal 63. CSEVI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 92. complement: MMIXAL 82, 86, 100. CSN: MMIX 16, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , config file line...: MMIX-CONFIG 10. 92, mmixal 63. config\_file: MMIX-CONFIG 9, 10, 19, 38. CSNI: MMIX-PIPE 47, MMIX-SIM 54, 92. config\_file\_name: MMMIX 2, 3. CSNN: MMIX 16, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , Configuration error...: MMIX-CONFIG 20, 92, mmixal 63. 23, 24, 25, 29, 31, 35, 37. Configuration syntax error...: MMIX-CSNNI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 92. CSNP: MMIX 16, MMIX-PIPE 47, MMIX-SIM 54, CONFIG 19, 23. 92, mmixal 63. confusion: MMIX-PIPE 13, 28, 135, 185, 187. CSNPI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 92. constant doesn't fit...: MMIXAL 117. CSNZ: MMIX 16, MMIX-PIPE 47, MMIX-SIM 54, constant\_found: MMIXAL 92, 93, 94, 95, 96. 92, mmixal 63. continuous profiling: MMIX 40. CSNZI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 92. control: MMIX-CONFIG 37, MMIX-PIPE 44, 45, CSOD: MMIX 16, MMIX-PIPE 47, MMIX-SIM 54, 46, 60, 63, 73, 78, 124, 127, 158, 159, 167, 230, 235, 248, 254, 255, 285, 357. 92, mmixal 63. control\_struct: MMIX-PIPE 23, 44. CSODI: MMIX-PIPE 47, MMIX-SIM 54, 92. CSP: MMIX 16, MMIX-PIPE 47, MMIX-SIM 54, cool: MMIX-PIPE 60, 61, 63, 67, 69, 75, 78, 81, 92, mmixal 63. 82, 84, 85, 86, 98, 99, 100, 102, 103, 104, CSPI: MMIX-PIPE 47, MMIX-SIM 54, 92. 105, 106, 108, 109, 110, 111, 112, 113, 114, 117, 118, 119, 120, 121, 122, 123, 145, 152, CSWAP: MMIX 31, 50, MMIX-PIPE 47, 271, 281, 158, 160, 227, 308, 309, 312, 314, 316, 322, MMIX-SIM <u>54</u>, 96, MMIXAL 63. 323, 324, 332, 333, 334, 335, 337, 338, 339, cswap: MMIX-PIPE 49, 51, 110, 117, 283, 307. 340, 341, 347, 355, 372, MMMIX 18, 19. CSWAPI: MMIX-PIPE 47, MMIX-SIM 54, 96. cool\_G: MMIX-PIPE 99, 102, 104, 105, 106, 110, CSZ: MMIX 16, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 117, 119, 120, 312, 323, 335, 337. 92, mmixal 63. CSZI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 92. cool\_hist: MMIX-PIPE 74, 75, 99, 151, 152, 160, 308, 309, 316. ctl: mmix-config 16, mmix-pipe 23, 30, 31,  $cool\_L$ : MMIX-PIPE  $\underline{99}$ , 102, 104, 105, 106, 110, 32, 44, 81, 124, 125, 128, 134, 222, 224, 231, 112, 114, 119, 120, 312, 323, 337, 338. 236, 243, 244, 245, 249, <u>255</u>, 286. cool\_O: MMIX-PIPE 75, 98, 100, 104, 105, 106, ctl\_change\_bit: MMIX-PIPE 81, 83, 85. 110, 112, 114, 117, 118, 119, 120, 145, 147,  $cur\_arg$ : mmix-sim 142,  $\underline{144}$ , 163. 333, 337, 338, 339, mmmix 19.cur\_dat: MMMIX 5, 8, 9, 10, 11, 12. cool\_S: MMIX-PIPE 75, 98, 100, 110, 113, 114,  $cur\_disp\_addr$ : MMIX-SIM <u>151</u>, 152, 156, 118, 119, 120, 145, 147, 337, MMMIX 19. 157, 159. copy\_block: MMIX-PIPE 184, 185, 217, 221. cur\_disp\_mode: MMIX-SIM 151, 152, 156, copy\_in\_time: MMIX-CONFIG 16, 23, MMIX-157, 159. PIPE 167, 217, 222, 224, 237, 276. cur\_disp\_set: MMIX-SIM 151, 152, 153, 156.

cur\_disp\_type: MMIX-SIM 151, 153, 159. dderr: mmixal 45, 114. cur\_file: MMIX-SIM 30, 31, 32, 35, 42, 44, 45, Dean, Jeffrey Adgate: MMIX 40. dec\_pt: MMIX-ARITH 73, 74, 76, 77, 79. 47, 49, 51, 53, 63, MMIXAL 36, 38, 45, 50, mmotype 15, 16, 17, 20, 21.decgamma: MMIX-PIPE 49, 114, 147, 327. cur\_greq: MMIXAL 108, 109, 134, 143. decimal: MMIX-SIM 133, 135, 137. cur\_line: MMIX-SIM 30, 31, 32, 35, 47, 51, 53, decimal-to-binary conversion: MMIX-ARITH 68. 63, 82, 83, 103, 105, 128, MMOTYPE 15, default\_qo: MMIX-PIPE 46. 16, 17, 20, 21. DEFINED: MMIXAL 58, 74, 78, 87, 91, 109, 115. cur\_loc: mmix-sim 30, 31, 32, 33, 34, 50, 51, defval: mmix-config <u>12</u>, <u>13</u>, <u>14</u>, 16, 17.  $\underline{162}$ ,  $\underline{165}$ , mmixal  $\underline{42}$ ,  $\underline{43}$ ,  $\underline{49}$ , 52,  $5\overline{3}$ , 96, 107, 109, 110, 114,  $11\overline{5}$ , 118,  $12\overline{5}$ ,  $12\overline{6}$ , deissues: MMIX-PIPE 60, 61, 63, 64, 67, 145, 160, 308, 309, 316, MMMIX 18. 130, 131, 132, mmmix  $\underline{5}$ , 7, 8, 9, 11, 12, del: MMIX-PIPE  $\underline{216}$ . MMOTYPE 15, <u>16</u>, 17, 18, 19, 21. delay: MMIX-PIPE 219. cur\_O: MMIX-PIPE 44, 46, 100, 145, 147, delink: MMIXAL 99, 100. мммих 19. delta: MMIX-ARITH  $\underline{6}$ ,  $\underline{93}$ , 94, MMIX-PIPE  $\underline{21}$ , cur\_prefix: MMIXAL <u>56</u>, 61, 87, 111, 132. MMIX-SIM 13, 25, 34, MMIXAL 28, cur\_round: MMIX-ARITH 30, 40, 43, 45, 46, MMMIX 25, MMOTYPE 1, 8, 19. 47, 86, 88, 89, 91, MMIX-PIPE 20, 346, demote\_and\_fix: MMIX-PIPE 198, 199, 233, 234, MMIX-SIM 13, 77, 89, 100, 158. 268, 271, 273, 353, 354, 365, 366, 367. cur\_S: mmix-pipe 44, 46, 100, 145, 147, demote\_usage: MMIX-PIPE 190, 191, 199. denin: MMIX-PIPE 44, 100, 133, 346, 348. MMMIX 19. cur\_seg: MMIX-SIM 151, 152, 161. denin\_penalty: MMIX-CONFIG 15, MMIXcur\_time: MMIX-PIPE 28, 29, 125. PIPE 279, 346, 348, <u>349</u>, 350. denout: MMIX-PIPE 44, 100, 133, 134, 346. cycs: MMIX-PIPE 9, 10 *d*: MMIX-ARITH <u>13</u>, <u>27</u>, <u>46</u>, <u>50</u>, MMIX-PIPE <u>28</u>, <u>31</u>, <u>97</u>, <u>170</u>, <u>197</u>, <u>201</u>, <u>203</u>, <u>220</u>, MMMIX <u>16</u>. denout\_penalty: MMIX-CONFIG 15, MMIX-PIPE 281, 346, <u>349</u>, 351. derr: mmixal 45, 86, 97, 100, 101, 102, 103, D\_BIT: MMIX-ARITH 31, MMIX-PIPE 54, 308, 104, 109, 116, 117, 121, 122, 123, 124, 129. 343, mmix-sim <u>57</u>, 88, mmixal 69. D\_Handler: MMIXAL 69. dest: mmixal 126, 131. die: MMIX-PIPE 144, 160, 265, 308, 309, 310. Danger: MMIXAL 142. dig: MMIX-SIM 15.dat: MMIX-ARITH 59, 60, 61, 62, 63, 64, 65, dirty: mmix-config 31, 32, 33, mmix-79, 80, 82, 83, mmix-sim 16, 20, 50, 51, 162, 165, mmixal 52. PIPE <u>167</u>, 170, 172, 179, 181, 185, 197, 201, 203, 216, 221, 259, 262. data: mmix-config 31, 32, 33, mmixdirty\_only: MMIX-PIPE 176, 177. PIPE 124, 125, 130, 131, 132, 133, 134, 135, 137, 138, 139, 140, 141, 142, 143, dispatch\_count: MMIX-PIPE 64, 65, 81. 144, 155, 156, 160, 167, 172, 179, 185, dispatch\_done: MMIX-PIPE 101, 112, 113, 197, 201, 203, 215, 216, 217, 218, 219, 220, 114, 332. dispatch\_lock: MMIX-PIPE 39, 64, 65, 75, 81, 222, 223, 224, 225, 226, 232, 233, 234, 237, 239, 243, 244, 245, 257, 259, 260, 261, 262, 85, 310, 329, 330, 356. 264, 265, 266, 267, 268, 269, 270, 271, 272, dispatch\_max: MMIX-CONFIG 15, 37, MMIX-273, 274, 275, 276, 277, 278, 279, 280, 281, PIPE <u>59</u>, 74, 75, 85, 162. 282, 283, 288, 289, 291, 292, 293, 294, 295, dispatch\_stat: MMIX-CONFIG 37, MMIX-296, 297, 298, 300, 301, 302, 304, 307, 308, PIPE  $64, \underline{66}, 162.$ 309, 310, 313, 325, 326, 327, 328, 329, 330, Ditzel, David Roger: MMIX 42. 331, 336, 342, 343, 344, 345, 346, 348, 350, DIV: MMIX 20, 50, MMIX-PIPE 47, MMIX-351, 352, 353, 354, 356, 357, 358, 359, 360, SIM <u>54</u>, 88, MMIXAL 63. 361, 363, 364, 365, 366, 367, 368, 369, 370, div: mmix-config 15, 27, 28, mmix-pipe  $\overline{2}$ , 378, 379, mmmix 12, 23. <u>49</u>, 51, 121, 343. Data\_Segment: MMIX-SIM 3, MMIXAL 69. DIVI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 88. Deache: MMIX-CONFIG 17, 21, 35, 36, MMIXdivide check exception: MMIX 20, 32. PIPE 39, 128,  $\underline{168}$ , 215, 217, 222, 227, 228, division by zero: MMIXAL 101.  $233,\ 234,\ 257,\ \overline{2}59,\ 261,\ 262,\ 263,\ 265,\ 267,$ DIVU: MMIX 20, MMIX-PIPE 47, MMIX-SIM 54, 268, 271, 273, 274, 275, 276, 280, 360, 364, 88, mmixal 63. 366, 378, 379, mmmix 21. divu: mmix-config 28, mmix-pipe  $\underline{49}$ , 51, Dclean: MMIX-PIPE 233. 121, 343. Dclean\_inc: MMIX-PIPE 233. DIVUI: MMIX-PIPE 47, MMIX-SIM 54, 88. Dclean\_loop: MMIX-PIPE 233. Dlocker: MMIX-PIPE 127, 128, 276. dd: MMIX-PIPE 197, 203. do\_resume\_trans: MMIX-PIPE 325, 326.

do\_syncd: MMIX-PIPE 280, 364, 369. es: MMIX-ARITH 50. do\_syncid: MMIX-PIPE 280, 364, 369. ESPEC: MMIXAL 20, 43, 62, 63, 132. doing\_interrupt: MMIX-PIPE 63, 64, 65, 314, et: MMIX-ARITH 50. 317, 318. ex: MMIX-ARITH 90. done: MMIX-ARITH 65, MMIX-IO 12, 13, exc: MMIX-SIM 60, 61, 84, 85, 87, 88, 89, 90, MMIX-PIPE 125, 134, 233, 234, MMMIX 13. 95, 108, 122, 123, 126, 131. done\_with\_write: MMIX-PIPE 256. exceptions: MMIX 32. down: MMIX-PIPE 40, 86, 89, 95, 97, 116. exceptions: MMIX-ARITH 31, 32, 33, 35, 36, 37, dpanic: MMIXAL 45, 47, 138, 142. 38, 40, 41, 42, 44, 46, 68, 86, 88, 89, 90, *DPTco*: MMIX-PIPE <u>235</u>, 236, 237. 91, 93, 94, MMIX-PIPE 20, 281, 346, 350, DPTctl: MMIX-PIPE <u>235</u>, 236. 351, mmix-sim 13, 89, 95. DPTname: MMIX-PIPE 235, 236. exec\_bit: MMIX-SIM 58, 63, 161, 162.  $DT_hit: MMIX-PIPE 267, 268, 270, 271,$ exit: MMIX-CONFIG 8, 38, MMIX-PIPE 14, 272, 273. MMIX-SIM 7, 14, 24, 26, 35, 143, 164, DT\_miss: MMIX-PIPE 267, 270, 272. MMIXAL 45, 137, 142, MMMIX 3, 6, 7, 8, 9, DT\_retry: MMIX-PIPE 272. 11, 12, MMOTYPE 2, 3, 9, 20, 23, 25, 26. DTcache: MMIX-CONFIG 17, 21, 35, MMIXexp: MMIX-ARITH 71, 73, 76, 77, 79, 83, 84. PIPE 39, 128, 168, 236, 237, 266, 267, 268, exp\_sign: MMIX-ARITH 77. 270, 272, 325, 353, 358, MMMIX 21, 23. expanding: MMIXAL 127, 137, 139. dump: MMIX-SIM 164, 165. expire: MMIX-PIPE 13, 14. dump\_file: MMIX-SIM 144, 146, 164, 166. **Extern**: MMIX-PIPE  $\underline{4}$ ,  $\underline{5}$ , 9, 29, 38, 59, 60, dump\_tet: MMIX-SIM 164, 165, 166. 66, 69, 77, 86, 87, 98, 115, 136, 150, 161, DUNNO: MMIX-PIPE 254, 255, 268, 270, 271, 278. 168, 175, 178, 180, 207, 209, 211, 212, 214, dynamic traps: MMIX 35, 37. 242, 247, 252, 284, 349. e: MMIX-ARITH 31, 34, 37, 38, 39, 40, 50, 56, 89 Extra bytes follow...: MMOTYPE 30. E\_BIT: MMIX-ARITH 31, 93, 94, MMIX-PIPE 54, f: mmix-arith 31, 34, 37, 38, 39, 40, 56, 60, 56, 306, 314, 317, 351. 61, 62, 82, MMIX-IO 10, MMIX-PIPE 75, ee: mmix-arith <u>37</u>, <u>38</u>, <u>50</u>, 51, 53. MMIX-SIM 62. ef: MMIX-ARITH 50, 51, 53. F\_BIT: MMIX-PIPE 54, 122, 256, 302, 306, 309, Emerson, Ralph Waldo: MMIX 7. 310, 313, 314, 317, 320, 321, 327. emulate\_virt: MMIX-PIPE 272, 310, 327. FADD: MMIX 22, MMIX-PIPE 47, MMIX-SIM 54, emulation: MMIX 24, 27, 33, 36, 38, 47, 49, 89, MMIXAL 63. MMIX-CONFIG 6. fadd: MMIX-CONFIG 15, 28, MMIX-PIPE 49, end\_simulation: MMIX-SIM 141, 149. 51, 346. EOF: MMIXAL 34, 35, MMMIX 10. fake\_stdin: mmix-sim 144, 145. eof: MMIX-IO 14, 15, 16, 17. false: MMIX-ARITH 1, 24, 68, MMIX-CONFIG 10, eps: MMIX-PIPE 21, MMIX-SIM 13. 15, MMIX-PIPE 11, 12, 59, 75, 81, 100, 112, equiv: MMIXAL 58, 59, 64, 66, 70, 75, 76,  $113, 114, 146, \overline{147}, 170, 179, 201, 203,$ 78, 82, 87, 94, 98, 99, 100, 101, 104, 108, 205, 221, 244, 259, 269, 301, 304, 314, 109, 110, 113, 114, 116, 117, 118, 121, 323, 324, 330, 332, 337, 340, 351, 363, 369, 122, 123, 124, 125, 126, 127, 129, 130, MMIX-SIM 9, 49, 51, 60, 87, 88, 90, 128, 131, 132, 134, 144. 131, 133, 141, 143, 149, 150, 152, 153, 161, equiv\_buf: mmotype 28, 29. MMIXAL 26, 34, 64, 66, 70, 118, 125, 130, err: mmixal 35, 45, 93, 95, 98, 99, 100, 101, 132, mmmix 8, 9, 10, 11, 12, 21, 22, 23.  $106, 108, 109, \overline{117}, 118, 121, 122, 123, 124,$ Fascicle 1: MMIX 4, MMIX-SIM 165, MMMIX 9. 126, 127, 129, 131, 132, MMOTYPE 13, Fclose: MMIXAL 69. 14, 18, 19, 20, 22. Fclose: MMIX-PIPE 371, 372, MMIX-SIM 59, err\_buf: mmixal 32, 33, 45. err\_count: MMIXAL 45, 46, 79, 142, 145. fclose: MMIX-IO 8, 11, MMIX-SIM 4, 32, 145, Error in tetra...: MMOTYPE 14. 150, mmmix 4. errprint\_coroutine\_id: MMIX-PIPE 24, 25, 28. FCMP: MMIX 23, MMIX-PIPE 47, MMIX-SIM 54,  $\mathit{errprint0} \colon \ \mathsf{MMIX\text{-}CONFIG} \ \underline{8}, \ 18, \ 24, \ 35, \ 36,$ 90, mmixal 63. 37, MMIX-PIPE <u>13</u>, 22, 25. errprint1: MMIX-CONFIG  $\underline{8}$ , 10, 16, 19, 20, 23, fcmp: MMIX-CONFIG 28, MMIX-PIPE 49, 24, 25, 29, 31, 32, 33, 34, 38, MMIX-PIPE 13, 51, 348. 14, 28, 213. FCMPE: MMIX 25, MMIX-PIPE 47, 348, MMIXerrprint2: MMIX-CONFIG 8, 20, 23, 31, 32, 33, SIM 54, 90, MMIXAL 63. MMIX-PIPE 13, 14, 25, 210. fcomp: MMIX-ARITH 85, MMIX-PIPE 21, 346, errprint3: MMIX-CONFIG 8, 32. 348, MMIX-SIM 13, 89, 90.

FDIV: MMIX 22, 50, MMIX-PIPE 47, MMIXfilename\_passed: MMIXAL 50, 51. SIM 54, 89, MMIXAL 63. fill\_from\_mem: MMIX-CONFIG 35, MMIXfdiv: MMIX-CONFIG 15, 28, MMIX-PIPE 49, PIPE 129, 222, 224, 237. 51, 346. fill\_from\_S: MMIX-CONFIG 35, MMIX-PIPE 129, fdivide: MMIX-ARITH 44, MMIX-PIPE 21, 346, 224, 237. MMIX-SIM 13, 89. fill\_from\_virt: MMIX-CONFIG 35, MMIXfeof: MMIX-IO 15, 17, MMIX-SIM 42. PIPE 129, 237, 242. feps: MMIX-CONFIG 15, 28, MMIX-PIPE 49, fill\_lock: MMIX-PIPE 167, 174, 222, 224, 225, 226, 237, 257, 261, 272, 274, 298, 300. 51, 348. fepscomp: MMIX-ARITH 50, MMIX-PIPE 21, filler: MMIX-CONFIG 16, 35, MMIX-PIPE 167. 348, mmix-sim 13, 90. 176, 195, 196, 204, 218, 224, 225, 261, 272, 274, 276, 298, 300, 326. FEQL: MMIX 23, MMIX-PIPE 47, MMIX-SIM 54, 90, mmixal 63. filler\_ctl: MMIX-CONFIG 16, MMIX-PIPE 167, FEQLE: MMIX 25, MMIX-PIPE 47, 348, MMIX-176, 225, 236, 261, 272, 274, 298, 300, 326. SIM <u>54</u>, 90, MMIXAL 63. fin\_bflot: MMIX-PIPE 346. ferror: MMIX-IO 13. fin\_bin: MMIXAL 99, 101. fetch: MMIX-CONFIG 37, MMIX-PIPE 68, 69, fin\_ex: MMIX-PIPE 135, 144, 155, 266, 269, 70, 73, 74, 301, MMMIX 22. 271, 272, 273, 274, 276, 279, 281, 283,  $296,\ 298,\ 300,\ 301,\ 313,\ 325,\ 326,\ 327,$ fetch\_bot: MMIX-CONFIG 37, MMIX-PIPE 69, 73, 74, 75, 301, MMMIX 22. 328, 329, 331, 336, 342, 345, 346, 350, 351, 356, 360, 363, 364, 370. fetch\_buf\_size: mmix-config 15, 37. fetch\_co: MMIX-PIPE 285, 286, 287. fin\_float: MMIX-SIM 89. fetch\_ctl: mmix-pipe 285, 286. fin\_flot: MMIX-PIPE 346. fin\_ld: MMIX-PIPE 279, MMIX-SIM 94. fetch\_hi: MMIX-PIPE 285, 294, 297, 301. fetch\_lo: MMIX-PIPE 285, 294, 297, 301, 304. fin\_pst: MMIX-SIM 95. fetch\_max: MMIX-CONFIG 15, MMIX-PIPE 59,  $fin\_st$ : MMIX-PIPE 281, MMIX-SIM 95. 284, 301. fin\_uflot: MMIX-PIPE 346. fetch\_one: MMIX-PIPE 301. fin\_unifloat: MMIX-SIM 89 finish\_store: MMIX-PIPE 272, 279, 280. fetch\_ready: MMIX-PIPE 285, 291, 292, 296, 297, 299, 301. FINT: MMIX 22, 24, 28, MMIX-PIPE 47, fetch\_retry: MMIX-PIPE 298, 300. MMIX-SIM <u>54</u>, 89, MMIXAL 63. fetch\_top: MMIX-CONFIG 37, MMIX-PIPE 69, fint: mmix-config 15, 28, mmix-pipe 49, 71, 73, 74, 75, 301, MMMIX 22. 51, 346, 347. fetched: MMIX-CONFIG 37, MMIX-PIPE 284, fintegerize: MMIX-ARITH 86, 88, MMIX-285, 294, 297, 301, 304. PIPE <u>21</u>, 346, MMIX-SIM <u>13</u>, 89. ff: MMIX-ARITH 63, 64, 65, 66, 79, 80, 81, 83. first: MMIX-PIPE 216. fflush: MMIX-IO 18, 19, 20, MMIX-PIPE 387, FIX: MMIX 27, 28, MMIX-PIPE 47, MMIX-MMIX-SIM 4, 120, 150, MMMIX 13. SIM <u>54</u>, 89, MMIXAL 63. fgetc: mmixal 34, 35, mmmix 10. fix: MMIX-CONFIG 15, 28, MMIX-PIPE 49, Fgets: MMIX-SIM 4, MMIXAL 69. 51, 346, 347. Fqets: MMIX-PIPE 371, 372, MMIX-SIM 59, 108. fix\_o: MMIXAL 58, 112, 118. faets: MMIX-CONFIG 10, 38, MMIX-IO 15, fix\_xyz: MMIXAL 58, 114, 130. MMIX-MEM 2, MMIX-PIPE 387, MMIX-SIM 4,  $fix_yz$ : MMIXAL <u>58</u>, 114, 125. 42, 45, 120, 150, mmixal 34, mmmix 6, 13. fixit: mmix-arith 88, mmix-pipe 21, 346, Fgetws: MMIX-SIM 4, MMIXAL 69. MMIX-SIM 13, 89. Fgetws: MMIX-PIPE 371, 372, MMIX-SIM 59, fixr: MMIX-SIM 34, MMOTYPE 19. FIXU: MMIX  $27, \overline{28}$ , MMIX-PIPE 47, MMIX-108. fgetws: MMIX-SIM 4. SIM <u>54</u>, 89, MMIXAL 63. file: MMIX-SIM  $\underline{4}$ . flags: mmix-pipe 80, 81, 83, 312, 320, mmix-sim 60,  $\underline{64}$ , 65. File...was modified: MMIX-SIM 44. file\_info: MMIX-SIM 35, 40, 42, 44, 45, 49. float-to-fix exception: MMIX 27, 32. file\_name: MMOTYPE 15, <u>16</u>, 20, 21. floating: MMIX-SIM 134, <u>135</u>, 137. file\_no: MMIX-SIM 16, 30, 51, 63. floating point arithmetic: MMIX 21. file\_node: MMIX-SIM 38, 40. floatit: MMIX-ARITH 89, MMIX-PIPE 21, 346, filename: mmix-config 38, mmixal 36, 37, MMIX-SIM <u>13</u>, 89. 38, 45, 50, 140. FLOT: MMIX 27, 28, MMIX-PIPE 47, MMIXfilename\_count: MMIXAL 37, 38, 140. SIM 54, 89, MMIXAL 63. FILENAME\_MAX: MMIX-IO  $\underline{2}$ , 8, MMIXAL 38, flot: MMIX-CONFIG 15, 28, MMIX-PIPE 49,

51, 346, 347.

39, 139.

FLOTI: MMIX-PIPE 47, MMIX-SIM 54, 89. FLOTU: MMIX 27, 28, MMIX-PIPE 47, MMIX-SIM <u>54</u>, 89, MMIXAL 63. FLOTUI: MMIX-PIPE 47, MMIX-SIM 54, 89. flush\_cache: MMIX-PIPE 202, 203, 205, 217, 233, 234, 263. flush\_listing\_line: MMIXAL 41, 42, 44, 45, 115, 132, 134, 136. flush\_to\_mem: MMIX-CONFIG 35, MMIX-PIPE 129, 215. flush\_to\_S: MMIX-CONFIG 35, MMIX-PIPE 129, flusher: MMIX-CONFIG 16, 35, MMIX-PIPE 167, 176, 202, 203, 204, 205, 215, 217, 221, 233, 234, 259, 263. flusher\_ctl: MMIX-CONFIG 16, MMIX-PIPE 167. fmt\_style: MMIX-SIM <u>135</u>, 137. FMUL: MMIX 22, MMIX-PIPE 47, MMIX-SIM 54, 89, mmixal 63. fmul: MMIX-CONFIG 15, 28, MMIX-PIPE 49, 51, 346. fmult: MMIX-ARITH 41, MMIX-PIPE 21, 346, MMIX-SIM 13, 89. Fopen: MMIX-SIM 4, MMIXAL 69. Fopen: MMIX-PIPE <u>371</u>, 372, MMIX-SIM <u>59</u>, 108. fopen: MMIX-CONFIG 38, MMIX-IO 8, MMIX-SIM 4, 24, 49, 145, 146, 150, MMIXAL 138, MMMIX 6, 9, MMOTYPE 3. forced traps: MMIX 35, 36. forward\_local: MMIXAL 90, 91, 111, 145. forward\_local\_host: MMIXAL 88, 90, 91. found: MMIX-SIM 21. fp: MMIX-IO 5, 7, 8, 10, 11, 13, 15, 17, 18, 19, 20, 21, 22, 23. fpack: mmix-arith 31, 34, 36, 39, 43, 45, 46, 47, 49, 84, 87, 89, 92, 94. fplus: MMIX-ARITH 46, MMIX-PIPE 21, 346, 51, 348. MMIX-SIM <u>13</u>, 89. fprintf: MMIX-CONFIG 8, MMIX-IO 23, MMIX-PIPE 13, 381, 384, MMIX-SIM 14, 24, 26, 35, 44, 49, 143, 145, 146, MMIXAL 30, 35, 41, 42, 44, 45, 78, 79, 80, 115, 132, 134, 137, 142, 145, MMMIX 3, 6, 7, 8, 9, 10, 11, 12, MMOTYPE 2, 3, 9, 14, 20, 23, 25, 26, 30. fputc: MMIX-SIM 133, 137, 138, 156, 159, 166. Fputs: MMIX-SIM 4, MMIXAL 69. Fputs: MMIX-PIPE <u>371</u>, 372, MMIX-SIM <u>59</u>, 108. fputs: MMIX-SIM 4. Fputws: MMIX-SIM 4, MMIXAL 69. Fputws: MMIX-PIPE 371, 372, MMIX-SIM 59, fputws: MMIX-SIM 4. frac: mmixal 82, 97, 101. frame pointer: MMIXAL 18. Fread: MMIX-SIM 4, MMIXAL 69. Fread: MMIX-PIPE 371, 372, MMIX-SIM 59, 108. fread: MMIX-IO 13, 17, MMIX-PIPE 387, 97, mmixal 63, mmmix 12. MMIX-SIM 4, 26, 120, MMOTYPE 9, 30. get: MMIX-CONFIG 28, MMIX-PIPE 49, 51,

free: MMIX-IO 12, 13, MMIX-SIM 24.

freeze\_dispatch: MMIX-PIPE 75, 81, 118, 355. FREM: MMIX 22, 34, 50, MMIX-PIPE 47, MMIX-SIM 54, 89, MMIXAL 63. frem: MMIX-CONFIG 27, 28, MMIX-PIPE 49, 51, 320, 350, 351. frem\_max: MMIX-CONFIG 15, MMIX-PIPE 349, fremstep: MMIX-ARITH 93, MMIX-PIPE 21, 350, 351, MMIX-SIM 13, 89. freopen: mmix-sim 4, 49. freq: MMIX-SIM 16, 50, 51, 63, 130. froot: MMIX-ARITH 91, MMIX-PIPE 21, 346, MMIX-SIM 13, 89. Fseek: MMIX-SIM 4, MMIXAL 69. Fseek: MMIX-PIPE <u>371</u>, 372, MMIX-SIM <u>59</u>, 108. fseek: MMIX-IO 21, MMIX-SIM 4, 45. FSQRT: MMIX 22, 28, 50, MMIX-PIPE 47, MMIX-SIM 54, 89, MMIXAL 63. fsqrt: mmix-config 15, 28, mmix-pipe 7, 49, 51, 346, 347. FSUB: MMIX 22, MMIX-PIPE 47, MMIX-SIM 54, 89, MMIXAL 63. fsub: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 346. Ftell: MMIX-SIM 4, MMIXAL 69. Ftell: MMIX-PIPE 371, 372, MMIX-SIM 59, 108. ftell: MMIX-IO 22, MMIX-SIM 4, 42. ftype: MMIX-ARITH 36, 37, 38, 39, 40, 41, 44, 46, 85, 86, 88, 91, 93. FUN: MMIX 23, MMIX-PIPE 47, 348, MMIX-SIM 54, 90, MMIXAL 63. func: MMIX-CONFIG 18, MMIX-PIPE 75, <u>76</u>, 77, 79. func\_struct: MMIX-PIPE 76. FUNE: MMIX 25, MMIX-PIPE 47, 348, MMIX-SIM <u>54</u>, 90, MMIXAL 63. funeq: mmix-config 28, mmix-pipe 49, funit: mmix-config 18, 25, 26, 29, mmix-PIPE <u>77</u>, 79, 82, MMMIX 24. funit\_count: MMIX-CONFIG 18, 19, 25, 26, MMIX-PIPE 77, 79, 82, MMMIX 24. funpack: MMIX-ARITH 36, 37, 40, 41, 44, 46, 50, 85, 86, 88, 91, 93. future reference cannot...: MMIXAL 109. future\_bits: MMIXAL 116, 119, 120, 125, 130. fwprintf: mmixal 30. Fwrite: MMIX-SIM 4, MMIXAL 69. Fwrite: MMIX-PIPE 371, 372, MMIX-SIM 59, fwrite: MMIX-IO 18, 19, 20, MMIX-SIM 4, mmixal 47.  $G: \text{ MMIX } \underline{29}, \text{ MMIX-SIM } \underline{75}.$ g: MMIX-ARITH  $\underline{56}$ ,  $\underline{61}$ ,  $\underline{62}$ , MMIX-PIPE  $\underline{86}$ , <u>167</u>, <u>172</u>, mmix-sim <u>76</u>. gap: mmix-sim 47, 48, 53, 128, 143. GET: MMIX 43, MMIX-PIPE 47, MMIX-SIM 54,

118, 146, 328.

get\_int: MMIX-CONFIG 11, 20, 23, 24. Hennessy, John LeRoy: MMIX 1, 3, MMIXget\_reader: MMIX-PIPE 182, 183, 233, 257, 266, PIPE 2, 58, 150, 163. 267, 271, 272, 273, 288, 291, 296, 353, 354, Henzinger, Monika Hildegard Rauch: MMIX 40. 358, 359, 360, 365, 366. hex: MMIX-SIM 134, 135, 137. get\_token: MMIX-CONFIG 10, 11, 18, 19, Hexadecimal file line...: MMMIX 6. 22, 23, 25. hexadecimal files: MMMIX 5. GETA: MMIX 18, MMIX-PIPE 47, MMIX-SIM 54,  $hi: \text{MMIX-SIM } \underline{15}.$ 85, MMIXAL 63. hist: mmix-pipe 44, 46, 68, 75, 85, 100, GETAB: MMIX-PIPE 47, MMIX-SIM 54, 85. 160, 308, 309. gg: MMIX-ARITH 63, 64, 65, 66, MMIXhit: MMIX-PIPE 193.CONFIG 16, 23, 31, 35, MMIX-PIPE 167, hit\_and\_miss: MMIX-PIPE 267, 268, 271, 273. 170, 172, 216. hit\_set: MMIX-PIPE 192, 193, 194, 196, 199, Ghemawat, Sanjay: MMIX 40. 201, 217. Gill, Stanley: MMIX-ARITH 26. hold\_buf: mmixal 43, 44, 47, 52. Gillies, Donald Bruce: MMIX-ARITH 26. hold\_op: MMIXAL <u>85,</u> 98. global registers: MMIX 29. holding\_time: MMIX-CONFIG 15, MMIX-GO: MMIX 19, MMIX-PIPE 47, 235, MMIX-PIPE <u>247</u>, 256, 257. SIM 54, 107, MMIXAL 63. hot: MMIX-PIPE 60, 61, 63, 64, 67, 69, 86, 101, qo: MMIX-CONFIG 16, 28, MMIX-PIPE 44, 46, 146, 147, 149, 255, 256, 314, 316, 317, 318, 49, 51, 85, 100, 119, 120, 122, 123, 128, 319, 320, 321, 357, MMMIX 18, 19.  $155,\ 160,\ 231,\ 236,\ 249,\ 286,\ 308,\ 312,\ 320,$ i: mmix-arith 8, 13, mmix-config 38, 321, 322, 327, 331, 364. MMIX-PIPE 10, 12, 44, 172, 176, 181, 185, GOI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 107. 201, 246, MMIX-SIM 62. good: mmix-sim <u>61</u>, 93, 133, 138. I can't allocate...: MMIX-PIPE 213. good\_quesses: MMIX-SIM 93, 139, 140. I can't deal with...: MMIXAL 50.  $qot\_DT$ : MMIX-PIPE 272. I can't open...: MMIX-SIM 49. qot\_qreq: mmixal 108. I'm reading this file...: MMOTYPE 23. got\_IT: MMIX-PIPE 291, 298. I/O: MMIX 33, 37, 44, MMIX-IO 1, MMIXgot\_one: MMIX-PIPE 291, 300, 301. MEM 1, MMIX-SIM 4. gran: MMIX-CONFIG 13, 15, 23. I\_BIT: MMIX-ARITH 31, 40, 41, 42, 44, graphics: MMIX 11. 46, 86, 88, 91, 93, MMIX-PIPE 54, 348, MMIX-SIM 57, 90, MMIXAL 69. Gray, James Nicholas: MMIX 31. GREG: MMIXAL 18, 62, 63, 102, 109, 132. I\_Handler: MMIXAL 69. IBM Corporation: MMIX 31. greq: MMIXAL 108, 127, 142, 143, 144. greg\_val: mmixal 108, 127, 133, 144. Icache: MMIX-CONFIG 17, 21, 35, 36, 37, MMIX-PIPE 39, 128, 168, 222, 227, 229,  $h: \text{MMIX-ARITH } \underline{3}, \text{MMIX-IO } \underline{3}, \text{MMIX-}$ 265, 280, 291, 292, 294, 296, 300, 359, PIPE <u>17</u>, <u>151</u>, <u>152</u>, <u>210</u>, <u>213</u>, MMIX-SIM <u>10</u>, 364, 365, mmmix 21. MMIXAL  $\underline{26}$ ,  $\underline{68}$ , MMOTYPE  $\underline{7}$ . IEEE/ANSI Standard 754: MMIX 21. H\_BIT: MMIX-PIPE 54, 146, 306, 308, 313, *Ihit\_and\_miss*: MMIX-PIPE <u>291</u>, 292, 296, 314, 317, 319, 320, 321, MMIX-SIM 57, 298, 299. 108, 122, 123. ii: mmix-pipe 185, 216.  $h\_down$ : MMIX-PIPE  $\underline{152}$ . IIADDU: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 85.  $h\_up$ : MMIX-PIPE  $\underline{152}$ . **IIADDUI**: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 85. Halt: MMIX-SIM 7, MMIXAL 69. illegal character constant: MMIXAL 106. Halt: MMIX-PIPE 371, 372, MMIX-SIM 59, 108. illegal fraction: MMIXAL 101. halted: MMIX-PIPE 10, 12, 356, 373, MMIXillegal hexadecimal constant: MMIXAL 95. SIM <u>61</u>, 107, 109, 140, 141, 161. illegal instructions: MMIX 28, 29, 33, 37, handle: MMIX-IO 8, 11, 12, 13, 14, 15, 16, 17, 38, 43, 45, 51. 18, 19, 20, 21, 22, MMIX-SIM 4, 134, 135, 137. illegal\_inst: mmix-pipe 118, 347, mmix-sim 89, handlers: MMIX 32, 35, 38. 97, 99, 100, 102, 104, 107, 124, 125. hardware\_PT: MMIX-CONFIG 15, 37. immed\_bit: mmixal 62, 121, 124. hash\_prime: MMIX-CONFIG 15, 37, MMIXimmediate operands: MMIX 5, 13. PIPE <u>207</u>, 209, 210, 213. head: mmix-pipe 69, 71, 73, 74, 75, 80, 81,  $implied\_loc:$  MMIX-SIM 51,  $\underline{52}$ , 53. Improper hexadecimal...: MMMIX 6, 7, 8. 84, 85, 100, 110, 114, 151, 152, 160, 228, 229, 301, 308, 309, 316, 323, 335, 341, improper local label...: MMIXAL 103. MMMIX 12, 15, 22. inbuf: MMIX-CONFIG 31, MMIX-PIPE 167, 200, held\_bits: MMIXAL 43, 44, 47, 49, 52. 201, 219, 220, 222, 223, 226, 245, 379.

incgamma: MMIX-PIPE 49, 113, 147, 323, interact\_after\_break: MMIX-SIM 61, 107, 327, 338. 141, 143. INCH: MMIX 13, MMIX-PIPE 47, MMIX-SIM 54, interacting: MMIX-SIM 61, 107, 120, 141, 143. 85, MMIXAL 63. interactive\_help: MMIX-SIM 144, 149. INCL: MMIX 13, MMIX-PIPE 47, MMIX-SIM 54, interactive\_read\_bit: MMIX-MEM 2. MMIX-85, MMIXAL 63. PIPE 8. incl\_file: MMIX-SIM 150, 151. interim: MMIX-PIPE 44, 46, 81, 100, 112, 113, incl\_read: MMIX-SIM 150. 114, 146, 227, 320, 330, 332, 337, 340, 342, 350, 351, 361, 363, 364, 369. INCMH: MMIX 13, MMIX-PIPE 47, MMIX-SIM 54, internal\_op: MMIX-CONFIG 28, MMIX-PIPE 51, 85, mmixal 63. 80, 320. INCML: MMIX 13, MMIX-PIPE 47, MMIX-SIM 54, 85, MMIXAL 63. internal\_op\_name: MMIX-PIPE 46, 50. internal\_opcode: MMIX-CONFIG 14, 28, incomplete...constant: MMIXAL 106. incomplete\_str: MMIX-SIM 149, 155. MMIX-PIPE 44, 49, 51, 246. interrupt: MMIX-PIPE 44, 46, 59, 68, 73, 81, Incorrect implementation...: MMIX-100, 118, 122, 132, 140, 141, 144, 146, 149, PIPE 22, MMIX-SIM 14. 160, 256, 266, 269, 271, 281, 282, 288, 301, incr: MMIX-ARITH 6, 33, 53, 55, 73, 87, 92, 302, 304, 306, 307, 308, 309, 310, 313, 314, MMIX-IO 4, 14,  $\overline{16}$ , 18, 19, 20, MMIX-PIPE 21, 317, 319, 320, 321, 322, 323, 327, 329, 330, 46, 64, 84, 85, 100, 113, 114, 119, 120, 236, 331, 332, 336, 337, 343, 346, 348, 350, 351,  $240,\ 265,\ 279,\ 301,\ 314,\ 320,\ 322,\ 323,\ 325,$ MMIX-SIM 141, 144, 148, MMMIX 22. 333, 338, 339, 369, 370, 373, 380, 381, 382, interrupts: MMIX 33, 34, 35, 36, 37, 38, 383, 384, 385, 386, mmix-sim 13, 30, 33, 34,  $37, 51, 60, 63, 70, 82, 83, 93, \overline{101}, 102, 103,$ MMIX-PIPE 306, MMIX-SIM 1, 2, 108. INTERVAL\_TIMEOUT: MMIX-PIPE 57, 314. 104, 105, 106, 108, 109, 115, 116, 118, 119, invalid exception: MMIX 21, 32. 127, 140, 152, 154, 155, 156, 162, 163, 165, MMIXAL 28, 47, 52, 94, 95, 107, 126, 131, IPTco: MMIX-PIPE 235, 236, 237. mmmix 12, 21, 25, mmotype 8, 15, 18, 19. IPTctl: MMIX-PIPE 235, 236. increase\_L: MMIX-PIPE 110, 312. IPTname: MMIX-PIPE 235, 236. increment...too large: MMOTYPE 19. IS: MMIXAL 16, 62, 63, 109, 132. incrl: MMIX-PIPE 49, 112, 119, 327. is\_dirty: MMIX-PIPE 169, 170, 177, 205, 233, 234. inexact exception: MMIX 21, 32. is\_load\_store: MMIX-PIPE 307, 310, 316, 320. Inf: MMIXAL 69. inf: MMIX-ARITH 36, 37, 38, 39, 40, 41, 42, is\_subnormal: MMIX-PIPE 346, 348, 350, 351. 44, 46, 50, 85, 86, 88, 91, 93. is\_trivial: MMIX-PIPE 346, 350. inf\_octa: MMIX-ARITH 4, 39, 41, 44, 46. isalpha: mmixal 57. infinity: MMIX 21. isdigit: mmix-arith 68, 73, 74, 77, mmixinfo: MMIX-SIM 51, 60, 65, 71, 79, 127, SIM 152, 155, MMIXAL 38, 57, 86, 94, 103, 130, 131. 104, 109, 110, 111. isletter: MMIXAL 57, 86, 103, 104.initialization of a user program: MMIX-SIM 6, 164. isspace: MMIX-CONFIG 10, 38, MMIX-SIM 150,  $inner\_lp$ : MMIXAL <u>82</u>, 86, 98. MMIXAL 38, 103, 104, 106. inner\_rp: mmixal <u>82</u>, 97, 98. issue\_bit: MMIX-PIPE 8, 10, 81, 145, 146, 147, 149, 283, 310, 314, 319, 320, 321. Input is not...: MMOTYPE 23. input/output: MMIX 33, 37, 44, MMIX-IO 1, issued\_between: MMIX-PIPE 158, 159, 160, 308, 309, 316. MMIX-MEM 1, MMIX-SIM 4. inst: MMIX-PIPE 68, 73, 75, 84, 100, 110, 114, isxdigit: mmix-sim 154, 161, mmixal 95. 228, 229, 304, 323, 335, 341, MMIX-SIM 60, IT\_hit: MMIX-PIPE 291, 292, 295, 296, 298, 299. 61, 63, 70, 108, 123, 130, MMMIX 12, 22. IT\_miss: MMIX-PIPE 291, 295, 298, 299. inst\_ptr: MMIX-PIPE 71, 73, 81, 85, 119, 120, ITcache: MMIX-CONFIG 17, 21, 35, MMIX-PIPE 39, 128, <u>168</u>, 236, 237, 288, 291, 292, 293, 295, <u>298</u>, 302, 325, 354, 360, 122, 123, 160, 284, 288, 290, 294, 301,  $302, 304, 308, \overline{309}, 310, 312, 314, 322.$ 323, mmix-sim 37, 60,  $\underline{61}$ , 63, 70, 93, 101, MMMIX 12, 21, 23. 107, 108, 123, 124, 131, 138, 140, 161, 164, IVADDU: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 85. MMMIX 12, 15, 22, 23. IVADDUI: MMIX-PIPE 47, MMIX-SIM 54, 85. INT\_MAX: MMIX-CONFIG 15, 38. j: MMIX-ARITH 8, 13, 56, MMIX-CONFIG 23,  $int\_op$ : MMIX-CONFIG 27,  $\underline{28}$ . 30, 31, 38, MMIX-PIPE 10, 12, 56, 162, int\_stages: MMIX-CONFIG 27, 28. <u>170, 172, 176, 179, 181, 183, 185, 189,</u> interact: mmix-sim 149. 191, 203, MMIX-SIM 15, 50, 62, 162, 165,

LDPTE: MMIX-PIPE 235, 236, 279. MMIXAL 44, 50, 52, 74, 136, MMMIX 17,  $\underline{24}$ , mmotype  $\underline{1}$ ,  $\underline{26}$ . ldpte: MMIX-PIPE 49, 235, 236, 265. jj: mmix-pipe 185, mmixal 52. LDPTP: MMIX-PIPE 235, 236, 279. JMP: MMIX 19, MMIX-PIPE 47, MMIX-SIM 54,  $ldptp: MMIX-PIPE 4\overline{9}, 235, 236, 265.$ 70, 107, mmixal 63. LDSF: MMIX 26, MMIX-PIPE 47, 271, 279, jmp: MMIX-PIPE 49, 51, 84, 85, 327. MMIX-SIM 54, 94, MMIXAL 63. JMPB: MMIX-PIPE 47, MMIX-SIM 54, 70, 107. LDSFI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 94. just\_traced: MMIX-SIM 128, 129. LDT: MMIX 7, MMIX-PIPE 47, 279, MMIXk: mmix-arith 8, 13, 29, 56, 81, 91, mmix-SIM <u>54</u>, 94, MMIXAL 63. CONFIG 31, MMIX-PIPE 76, MMIX-SIM 42, **LDTI**: MMIX-PIPE <u>47</u>, MMIX-SIM <u>54</u>, 94.  $\underline{45}, \underline{47}, \underline{62}, \underline{82}, \underline{83}, \underline{143}, \underline{160}, \text{mmixal } \underline{42}, \underline{44},$ LDTU: MMIX 7, MMIX-PIPE 47, 279, MMIX-52, 136, MMMIX 17, MMOTYPE 26. SIM 54, 94, MMIXAL 63. K\_BIT: MMIX-PIPE 54, 118, 322. LDTUI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 94. keep: MMIX-PIPE 202, 203. LDUNC: MMIX 30, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , key: MMIX-PIPE 210, 213, MMIX-SIM 20, 21, 22. 94, mmixal 63. kind: MMIX-MEM  $\underline{1}$ , 2. ldunc: MMIX-PIPE 49, 51, 117, 265, 268, known: MMIX-PIPE 40, 43, 44, 46, 59, 85, 271, 273, 357. 89, 93, 100, 102, 112, 119, 120, 131, 132, LDUNCI: MMIX-PIPE 47, MMIX-SIM 54, 94. 133, 135, 144, 237, 244, 255, 265, 290, LDVTS: MMIX 46, MMIX-PIPE 47, MMIX-SIM 54, 312, 322, 331, 338, 364. 107, mmixal 63. known\_phys: MMIX-PIPE 296, 298. ldvts: MMIX-PIPE 49, 51, 118, 265, 271, 352. Knuth, Donald Ervin: MMIX-ARITH 58. LDVTSI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 107. L: MMIX  $\underline{29}$ , MMIX-SIM  $\underline{75}$ . LDW: MMIX 7, MMIX-PIPE  $\underline{47}$ , 279, MMIXl: MMIX-ARITH 3, MMIX-CONFIG 30, MMIX-SIM <u>54</u>, 94, MMIXAL 63. IO 3, MMIX-PIPE 17, 86, 187, 189, 191 LDWI: MMIX-PIPE 47, MMIX-SIM 54, 94. LDWU: MMIX 7, MMIX-PIPE 47, 279, MMIX-MMIX-SIM 10, 22, 42, 76, MMIXAL 26, 52, 68, mmotype 7. SIM <u>54</u>, 94, MMIXAL 63.  $lab\_field$ : MMIXAL 32, 33, 102, 103, 104, LDWUI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 94. 109, 110, 111. *left*: MMIX-SIM  $\underline{16}$ ,  $\overline{21}$ , 22, 50,  $\underline{162}$ , 165, label field...ignored: MMIXAL 102. MMIXAL <u>54</u>, 57, 72, 73, 74. label syntax error...: MMIXAL 103. left\_paren: MMIX-SIM 138, 139.  $last\_h$ : MMIX-PIPE 209, 210,  $\underline{211}$ , 213, 216, Leung, Shun-Tak Albert: MMIX 40. 219, 223, 297, mmmix 8, 11. lg: MMIX-CONFIG 30, 31.last\_mem: MMIX-SIM 18, 19, 20, 21. lhs: MMIX-SIM 80, 101, 107, 131, 133, 138, 139.  $last\_off$ : MMIX-PIPE 216. *lim*: MMIX-PIPE <u>185</u>. last\_sym\_node: MMIXAL 59, 60. line directives: MMIXAL 3. last\_trie\_node: MMIXAL 55, <u>56</u>. line\_count: MMIX-SIM 38, 42, 45. ld: MMIX-CONFIG 27, 28, MMIX-PIPE 49, 51, line\_listed: MMIXAL 34, 36, 41, 45, 136. 117, 265, 271, 307, 327, 357. line\_no: mmix-sim 16, 30, 51, 63, mmixal 34, ld\_ready: MMIX-PIPE <u>267</u>, 268, 270, 271, 273, <u>36</u>, 38, 45, 50. 274, 277, 278, 279. line\_shown: MMIX-SIM 45, 48, 51. *ld\_retry*: MMIX-PIPE 272, 273, 274. link: MMIXAL 58, 59, 64, 66, 70, 74, 75, 78, *ld\_st\_launch*: MMIX-PIPE <u>265</u>, 266, 354. 82, 87, 91, 94, 99, 100, 109, 110, 112, 116, LDA: MMIX 7, MMIXAL 13, 18, 63. 118, 125, 130, 132, 145. LDB: MMIX 7, MMIX-PIPE 47, 279, MMIX-Liptay, John S.: MMIX 30. SIM <u>54</u>, 94, MMIXAL 63. list: MMIX-ARITH 2, MMIX-IO 2, MMIX-PIPE 6, LDBI:  $\overline{\text{MMIX-PIPE }}47$ ,  $\overline{\text{MMIX-SIM }}54$ , 94. MMIX-SIM 11, MMIXAL 31, MMOTYPE 5. LDBU: MMIX 7, MMIX-PIPE  $\underline{47}$ , 279, MMIXlisted\_file: MMOTYPE 15, <u>16</u>, 17, 21. SIM <u>54</u>, 94, MMIXAL 63. listing: MMOTYPE  $2, \underline{4}, 13, 19, 21, 23, 24.$ LDBUI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 94. listing\_bits: mmixal <u>43</u>, 44, 47, 52, 136. LDHT: MMIX 7, MMIX-PIPE 47, 279, MMIXlisting\_clear: MMIXAL 44, 47, 52, 136. SIM <u>54</u>, 94, MMIXAL 63. listing\_file: MMIXAL 41, 42, 44, 45, 47, 52, 75, LDHTI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 94.  $78,\ 80,\ 109,\ 115,\ 132,\ 134,\ 136,\ 138,\ \underline{139}.$ LDO: MMIX 7, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , listing\_loc: MMIXAL 42,  $\underline{43}$ , 44. 94, mmixal 63. listing\_name: MMIXAL 137, 138, <u>139</u>. LDOI: MMIX-PIPE 47, MMIX-SIM 54, 94. literate programming: MMIXAL 3. LDOU: MMIX 7, MMIX-PIPE 47, 114, 332, little-endian versus big-endian: MMIX 6, 12, MMIX-SIM 54, 94, MMIXAL 63. MMIX-IO 16, MMIX-PIPE 304, MMIXAL 47, LDOUI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 94. MMMIX 10.

MMIX-SIM 72, 73, 74, 76, 77, 80, 81, 82, ll: MMIX-SIM 30, 37, 62, 63, 82, 83, 94, 95, 96, 103, 105, 111, 114, 117, 118, 119, 130, 83, 101, 102, 104, 157, 159. 157, 159, 161, 163, 164. lring\_size: MMIX-CONFIG 15, 37, MMIXlo: MMIX-SIM 15. PIPE 86, 88, 89, 114, MMIX-SIM 72, 76, load\_cache: MMIX-PIPE 200, 201, 222, 224, 237. 77, 143, mmmix 18. load\_sf: MMIX-ARITH 39, MMIX-PIPE 21, 279, lru: MMIX-CONFIG 22, MMIX-PIPE 164, 186, MMIX-SIM 13, 94. 187, 189, 191. LOC: MMIXAL 16, 62, 63, 109, 132.  $\mu$ : MMIX 50, MMIX-SIM 1. loc: MMIX-IO 23, MMIX-PIPE 44, 46, 68, 73, m: MMIX-ARITH 27, MMIX-PIPE 12, 187, 189, 80, 81, 84, 85, 100, 118, 119, 122, 144, 191, 268, 270, 271, 278, 381, 384, MMIX-149, 151, 152, 160, 236, 266, 271, 296, SIM 114, 117, MMIXAL 74, MMMIX 16, 304, 320, 322, 323, 331, 355, 364, 368, 372, MMOTYPE  $\underline{26}$ . MMIX-SIM 16, 18, 20, 21, 22, 30, 51, 60,  $ma: \text{MMIX-PIPE } \underline{372}, 380, \text{MMIX-SIM } \underline{61}, 108,$ 61, 63, 70, 101, 109, 127, 130, 162, 163, 111, 133, 136. 165, mmmix 12, 15, 22. magic\_done: MMIX-PIPE 372. loc\_implied: MMIX-SIM 51. magic\_offset: MMIX-ARITH 63. LOCAL: MMIXAL 19, 62, 63, 132. magic\_read: MMIX-PIPE 377, 378, 380, 381, local registers: MMIX 29. localtime: MMOTYPE 23. magic\_write: MMIX-PIPE 377, 379, 385, 386. lock: MMIX-PIPE 167, 174, 200, 217, 222, 224, Main: MMIX-SIM 6, MMIXAL 21, 71. 225, 226, 233, 234, 237, 257, 261, 266, 267, main: MMIX-SIM 141, MMIXAL 136, MMMIX 2, 271, 272, 273, 274, 276, 288, 291, 296, 300, MMOTYPE 1. 326, 353, 354, 358, 359, 360, 365, 366, 367. make\_it\_infinite: MMIX-ARITH 72, 79. lockloc: MMIX-PIPE 23, 37, 125, 145, 234, 257, make\_it\_zero: MMIX-ARITH 79. 279, 287, 301, 360, 361, 364. make\_ld\_ready: MMIX-PIPE 271. lockvar: MMIX-PIPE 37, 65, 167, 214, 230, 247. make\_map: MMIX-SIM 42, 49. long\_warning\_qiven: MMIXAL 35, 36. make\_two\_three: MMIXAL 116 loop: MMIX-SIM 29, 36, 42, MMOTYPE 13, 21. many\_arg\_bit: MMIXAL 62, 116. lop\_end: MMIX-SIM 23, MMIXAL 23, 24, 80, map: mmix-sim 38, 42, 45, 49. MMOTYPE 6, 22, 30.marginal registers: MMIX 29.  $lop\_file$ : MMIX-SIM 23, 35, MMIXAL 23, 24, mask: MMIX-ARITH 13, 18, MMIX-PIPE 282. 50, mmotype  $\underline{6}$ , 20. matrices of bits: MMIX 12. lop\_fixo: mmix-sim 23, 34, mmixal 23, 24, max: MMIX-PIPE 268, 292. 113, MMOTYPE 6, 19. max\_cycs: MMIX-CONFIG 15, 23, 24, 36. lop\_fixr: mmix-sim 23, 34, mmixal 23, 24, max\_mem\_slots: MMIX-CONFIG 15, MMIX-114, MMOTYPE 6, 19. PIPE <u>86</u>, 89. lop\_fixrx: MMIX-SIM 23, 34, MMIXAL 23, 24, max\_pipe\_op: MMIX-CONFIG 27, MMIX-PIPE 49, 114, MMOTYPE 6, 19. 133, 136. lop\_line: MMIX-SIM 23, 35, MMIXAL 23, 24, max\_real\_command: MMIX-CONFIG 27, 28, 50, MMOTYPE 6, 20. MMIX-PIPE <u>49</u>, 81. lop\_loc: MMIX-SIM 23, 33, MMIXAL 23, 24, max\_rename\_regs: MMIX-CONFIG 15, MMIX-49, MMOTYPE 6, 18. PIPE 86, 89. $lop\_post$ : MMIX-SIM  $\underline{23}$ , 25, 29, MMIXAL 23, max\_stage: MMIX-CONFIG 36, MMIX-PIPE 26, <u>24</u>, 144, mmotype <u>6</u>, 22. 129.  $lop\_pre$ : mmix-sim 23, 28, mmixal 23, 24, 141, MMOTYPE 6, 22, 23. max\_sys\_call: MMIX-PIPE 371, 372, MMIX-SIM 59, 108. lop\_quote: MMIX-SIM 23, 29, 33, 36, MMIXAL 23, <u>24</u>, 47, MMOTYPE <u>6</u>, 13, 18, 21. maxval: MMIX-CONFIG 12, 13, 20, 23. mb: MMIX-PIPE <u>372</u>, 380, MMIX-SIM <u>61</u>, 108, lop\_quote\_command: MMIXAL 47. 111, 133, 136.  $lop\_skip$ : mmix-sim  $\underline{23}$ , 33, mmixal 23,  $\underline{24}$ , 49, MMOTYPE 6, 18. McLellan, Hubert Rae, Jr.: MMIX 42. mem: MMIX-PIPE 113, 114, <u>115</u>, 116, 117,  $lop\_spec:$  MMIX-SIM  $\underline{23}$ , 36, MMIXAL 23,  $\underline{24}$ , 132, MMOTYPE 6, 21. 227, 236, 246, 249, 254, 255, 265, 333,  $lop\_stab$ : mmix-sim 23, mmixal 23, 24, 80, 334, 339, 355. MMOTYPE  $\underline{6}$ , 22,  $\overline{25}$ . mem\_addr\_time: MMIX-CONFIG 15, 36, MMIX-PIPE 214, 216, 219, 225, 260, 261, 271, lopcodes: MMIXAL 22. 274, 277, 297, 300.

mem\_bit: MMIXAL 62, 116, 124.

mem\_bus\_bytes: MMIX-CONFIG 15, 36.

lreq: MMIXAL 132, 142, 143.

lring\_mask: MMIX-PIPE 88, 89, 104, 105, 106,

110, 112, 113, 114, 117, 119, 120, 337, 338,

mem\_chunks: MMIX-CONFIG 37, MMIXmmix\_fqets: MMIX-IO 14, MMIX-PIPE 372, 376, PIPE 207, 213. MMIX-SIM 108, 113. mmix\_faetws: MMIX-IO 16, MMIX-PIPE 372, mem\_chunks\_max: MMIX-CONFIG 15, 37. MMIX-PIPE 206, 207, 213. 376, MMIX-SIM 108, 113. mem\_direct: MMIX-PIPE 257. mmix\_fopen: MMIX-IO 8, MMIX-PIPE 372, 376, mem\_find: MMIX-SIM 20, 30, 37, 63, 82, 83, MMIX-SIM 108, 113. 94, 95, 96, 103, 105, 111, 114, 117, 130, mmix\_fputs: MMIX-IO 19, MMIX-PIPE 372, 376, 157, 159, 161, 163, 164. MMIX-SIM 108, 113. mem\_hash: MMIX-CONFIG 37, MMIX-PIPE 207, mmix\_fputws: MMIX-IO 20, MMIX-PIPE 372, 209, 210, 213, 216, 219, 223, 297, 376, MMIX-SIM 108, 113. MMMIX 8, 11. mmix\_fread: MMIX-IO 12, MMIX-PIPE 372, mem\_lock: MMIX-PIPE 39, 214, 215, 219, 222, 376, MMIX-SIM 108, 113. 225, 260, 261, 271, 274, 277, 297, 300. mmix\_fseek: MMIX-IO 21, MMIX-PIPE 372, 376,  $\begin{array}{cccc} \textit{mem\_locker:} & \text{mmix-pipe} \ \underline{127}, \ 128, \ 219, \ 260, \\ & 271, \ 277, \ 297. \end{array}$ MMIX-SIM 108, <u>113</u>. mmix\_ftell: MMIX-IO 22, MMIX-PIPE 372, 376, mem\_node: MMIX-SIM 16, 17, 19, 20, 21, MMIX-SIM 108, 113. 22, 50, 162, 165. mmix\_fwrite: MMIX-IO 18, MMIX-PIPE 372, mem\_node\_struct: MMIX-SIM 16. 376, MMIX-SIM 108, 113. mem\_read: MMIX-PIPE 208, 209, 210, 219, 222, MMIX\_init: mmix-pipe 1, 9, 10, mmmix 2. 271, 277, 297, 378, MMMIX 12, 18. mmix\_io\_init: MMIX-IO 7, MMIX-SIM 113, mem\_read\_time: MMIX-CONFIG 15, 36, MMIX-141, mmmix 2, 25. PIPE <u>214</u>, 219, 222, 223, 271, 277, 297. mmix\_opcode: MMIX-CONFIG 28, MMIXmem\_root: MMIX-SIM 18, 19, 21, 53, 161, 164. PIPE 44, 47, 75, 156, 157, MMIX-SIM 54, mem\_slots: MMIX-PIPE 63, 86, 89, 111, 145, 62, 91. 147, 256. MMIX\_run: MMIX-CONFIG 28, MMIX-PIPE 1, mem\_tetra: MMIX-SIM 16, 20, 62, 82, 83, 9, 10, mmmix 15. 114, 117. mmmix>: MMMIX 13. mem\_write: MMIX-PIPE 208, 212, 213, 216, mmo\_buf: MMIXAL 47, 48, 50. 260, 379, mmmix 8, 11, 12. mmo\_byte: MMIXAL <u>48</u>, 74, 75, 80. mem\_write\_time: MMIX-CONFIG 15, 36, MMIXmmo\_clear: MMIXAL 47, 49, 52. PIPE 214, 216, 260. mmo\_cur\_file: MMIXAL 50, 51, 141. mem\_x: MMIX-PIPE 44, 46, 100, 111, 113, 117, mmo\_cur\_loc: MMIXAL 47, 49, 51, 53. 123, 144, 145, 146, 147, 255, 327, 339, 355. mmo\_err: MMIX-SIM 26, 28, 29, 33, 34, 35. memory-mapped input/output: MMIX 44, mmo\_file: MMIX-SIM 24, 25, 26, 32, MMIX-MEM 1. MMOTYPE 3, 4, 9, 30. mems: MMIX 50, MMIX-SIM 1. mmo\_file\_name: MMIX-SIM 24, 142. mems: MMIX-SIM 64, 127. mmo\_line\_no: MMIXAL 47, 50, <u>51</u>. message: mmixal 45. mmo\_load: MMIX-SIM 30, 34. Metze, Gernot: MMIX 40. mmo\_loc: MMIXAL 49, 53, 112, 132. mid: mmixal 54, 57, 61, 72, 73, 74, 75, 80. mmo\_lop: MMIXAL 48, 49, 50, 80, 113, 114, Miller, Jeffrey Charles Percy: MMIX-ARITH 26. 141, 144. minus: mmixal 82, 97, 101. mmo\_lopp: mmixal 48, 49, 50, 80, 114, 132. minus zero: MMIX 21, 22, 23. mmo\_out: MMIXAL <u>47</u>, 48, 50. minval: MMIX-CONFIG 12, 13, 20, 23. mmo\_ptr: mmixal 47, 48, 80. missing left parenthesis: MMIXAL 98. mmo\_sync: MMIXAL 50, 52, 132. missing right parenthesis: MMIXAL 98. mmo\_tetra: MMIXAL 48, 49, 113, 114, 141, 144. mm: mmix-sim 23, 28, 29, 36, mmixal 22, 47, mmo\_write: MMIXAL 47. 48, MMOTYPE 6, 13, 21, 23, 25, 30.  $mmputchars: MMIX-IO \underline{4}, 12, 14, 16, MMIX$  $mmgetchars: MMIX-IO \underline{4}, 8, 18, 19, 20,$ PIPE 377, 384, MMIX-SIM 117, 163. mmix-pipe 377, 381, mmix-sim 114. mod: mmixal 82, 97, 101. MMIX binary file...: MMMIX 12. mode: MMIX-CONFIG 16, 23, 31, MMIX-IO 5, mmix>: MMIX-SIM 3, 150. 7, 8, 11, 12, 14, 16, 18, 19, 20, 21, 22,MMIX\_config: MMIX-CONFIG 8, 38, MMIX-PIPE 1, 9, 23, 29, 49, 59, 136, 207, 259, 23, mmix-pipe <u>21</u>, <u>167</u>, 217, 257, 263, MMIX-SIM 4,  $\underline{13}$ . MMMIX  $2, \underline{25}$ .  $mode\_code$ : MMIX-IO 8,  $\underline{9}$ . mmix\_fake\_stdin: MMIX-IO 10, MMIX-SIM 113, mode\_string: MMIX-IO 8, 9, MMIX-SIM 4. mmix\_fclose: MMIX-IO 11, MMIX-PIPE 372, MOR: MMIX 12, MMIX-PIPE 47, MMIX-SIM 54,

87, MMIXAL 63.

376, MMIX-SIM 108, 113.

NANDI: MMIX-PIPE 47, MMIX-SIM 54, 86. mor: MMIX-CONFIG 15, 28, MMIX-PIPE 49, 51, 344. need\_b: MMIX-PIPE 44, 46, 100, 106, 108, 112, More...chunks are needed: MMIX-PIPE 213. 113, 114, 131, 312, 345, MORI: MMIX-PIPE <u>47</u>, MMIX-SIM <u>54</u>, 87. need\_ra: MMIX-PIPE 44, 46, 100, 108, 112, MSE: MMIX-SIM 4. 113, 131, 324. MUL: MMIX 20, 50, MMIX-PIPE 47, MMIX-NEG: MMIX 9, MMIX-PIPE 47, MMIX-SIM 54, SIM 54, 88, MMIXAL 63. 85, MMIXAL 63. mul: MMIX-CONFIG 27, 28, MMIX-PIPE 49, neg\_one: MMIX-ARITH  $\underline{4}$ , 24, MMIX-IO  $\underline{4}$ , 8,  $11, 12, 14, 15, 16, \overline{17}, 19, 20, 21, \overline{22},$ 51, 343. MMIX-PIPE 20, 22, 143, 236, 282, 372, MULI: MMIX-PIPE 47, MMIX-SIM 54, 88. MMIX-SIM 13, 14, 53, 77, 90, MMIXAL 27, multiprecision conversion: MMIX-ARITH 54, 68. 29, mmmix 12, 23, 25. multiprecision division: MMIX-ARITH 13. multiprecision multiplication: MMIX-ARITH 8. negate: MMIXAL 82, 86, 100. negate\_q: MMIX-ARITH 24. MULU: MMIX 20, MMIX-PIPE 47, MMIX-SIM 54, negation, floating point: MMIX 13. 88, mmixal 63. negative locations: MMIX 35, 40, 44. mulu: MMIX-PIPE 49, 51, 121, 343. **NEGI**: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 85, MULUI: MMIX-PIPE 47, MMIX-SIM 54, 88. MMMIX 12.  $mul\theta$ : MMIX-CONFIG 15, 27, MMIX-PIPE 49, **NEGU:** MMIX 9, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 343. mul1: MMIX-CONFIG 15, MMIX-PIPE 49, 343. 85, MMIXAL 63. NEGUI: MMIX-PIPE 47, MMIX-SIM 54, 85. mul2: MMIX-CONFIG 15, MMIX-PIPE 49. mul3: MMIX-CONFIG 15, MMIX-PIPE 49. new\_cache: mmix-config 16, 17, 21. mul4: MMIX-CONFIG 15, MMIX-PIPE 49. new\_chunk: MMMIX 5, 6, 7, 8, 9, 11. muls: mmix-config 15, mmix-pipe  $\underline{49}$ . new\_cool: MMIX-PIPE 75, 78, 101. mul6: MMIX-CONFIG 15, MMIX-PIPE  $\overline{49}$ . new\_fetch: MMIX-PIPE 288, 298, 301, 302. new\_head: MMIX-PIPE 74, 75, 81, 85, 120. mul7: MMIX-CONFIG 15, MMIX-PIPE  $\overline{49}$ .  $new\_inst\_ptr$ : MMMIX  $\underline{15}$ . mul8: MMIX-CONFIG 15, 27, MMIX-PIPE 49,  $new_L$ : MMIX-PIPE 120. 343.MUX: MMIX 10, MMIX-PIPE 47, MMIX-SIM 54, new\_link: mmixal 109, 110, 115. new\_mem: MMIX-SIM 17, 18, 21. 87, MMIXAL 63. mux: mmix-config 15, 28, mmix-pipe 49, new\_mode: MMIX-SIM 152. new\_O: MMIX-PIPE 75, 99, 100, 119, 120, 51, 142. 333, 334, 338, 339. MUXI: MMIX-PIPE 47, MMIX-SIM 54, 87. new\_Q: MMIX-PIPE 146, 148, 149, 310, MXOR: MMIX 12, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 314, 329. 87, MMIXAL 63. new\_S: MMIX-PIPE 75, 99, 100, 113, 114, MXORI: MMIX-PIPE 47, MMIX-SIM 54, 87.  $my\_div$ : MMIX-PIPE 7. 333, 334, 339. my\_fsqrt: MMIX-PIPE 7. new\_sym\_node: mmixal 59, 64, 66, 70, 71, 87, 111, 118, 125, 130. my\_random: MMIX-PIPE 7. new\_tail: mmix-pipe 301, mmmix 22. myself: MMIX-SIM 142, 143, 144. new\_trie\_node: MMIXAL 55, 57, 61. n: MMIX-ARITH <u>13</u>, MMIX-CONFIG <u>23</u>, <u>30</u>. 38, MMIX-IO 12, 14, 16, 18, 19, 20, 23, next: MMIX-PIPE 23, 26, 28, 32, 33, 35, 82, 125, 134, 145, 176, 183, 196, 202, 205, MMIX-SIM <u>148</u>, MMMIX <u>16</u>. 217, 218, 221, 225, 233, 234, 259, 261, **N\_BIT**: MMIX-PIPE 54, 271. name: MMIX-CONFIG 12, 13, 14, 16, 18, 20, 23, 24, 25, 26, 29, 31, 32, 33, 34, 35, 36, 263, 266, 272, 274, 276, 298, 300, 326, 350, 361, 363, 364, 368. next\_char: MMIX-ARITH 68, 69, 71, 72, 73, 77, MMIX-IO 8, MMIX-PIPE 23, 25, 39, 76,  ${\rm MMIX\text{-}SIM}\ \underline{13},\ 152,\ 153,\ 154,\ 155,\ 161.$ 128, <u>167</u>, 174, 176, 231, 236, 249, 286, next\_sym\_node: MMIXAL 59, 60. MMIX-SIM 4, 35, 38, 44, 49, 51, 64, 130,  $\mathop{\mathrm{MMIXAL}}\nolimits \; \underline{62}, \; 64, \; \underline{68}, \; 70, \; \mathop{\mathrm{MMMIX}}\nolimits \; 24.$ next\_sync: mmix-pipe 364. name\_buf: MMIX-IO 8. next\_trie\_node: mmixal 55, 56. next\_val: mmixal <u>83</u>, 99, 101. NaN: mmix 21. NNIX operating system: MMIX 2. *NaN*: MMIX-ARITH 68, <u>70</u>, 73, 84. nan: mmix-arith 36, 37, 38, 39, 40, 42, no base address...: MMIXAL 127. No file was selected...:  $MMOTYPE\ 20$ . 50, 85, 86, 88, 91. NAND: MMIX 10, MMIX-PIPE 47, MMIX-SIM 54. No name given...: MMOTYPE 20. 86, mmixal 63. no opcode...: MMIXAL 104. nand: MMIX-CONFIG 28, MMIX-PIPE 49, No room...: MMIX-SIM 35, 42, 77, MMIXAL 32, 51, 138. 84, MMOTYPE 20.

no-op: MMIX 49. 380, 381, 384, MMIX-SIM 10, 12, 13, 15, no\_const\_found: MMIX-ARITH 68. 16, 19, 20, 31, 50, 52, 61, 76, 77, 91, 113, no\_hardware\_PT: MMIX-CONFIG 37, MMIX-114, 117, 137, 140, 151, 154, 160, 162, 165, PIPE 242, 272, 298. MMIXAL 26, 27, 28, 43, 49, 51, 58, 82, 83, 114, 126, 127, 131, 133, MMMIX 5, 16, 17, no\_label\_bit: MMIXAL 62, 102. NONEXISTENT\_MEMORY: MMIX-PIPE 57. 20, 25, MMOTYPE 7, 8, 16. Nonzero byte follows...: MMOTYPE 30. octabyte: mmix 6. odd: MMIX-ARITH 93, 94, 95. noop: MMIX-CONFIG 28, MMIX-PIPE 49, 51, ODIF: MMIX 11, MMIX-PIPE 47, MMIX-SIM 54, 80, 118, 122, 322, 323, 327, 332, 337. noop\_inst: MMIX-PIPE 118, 227. 87, MMIXAL 63. odif: mmix-config 28, mmix-pipe 49, 51, 344. NOR: MMIX 10, MMIX-PIPE 47, MMIX-SIM 54, **ODIFI:** MMIX-PIPE <u>47</u>, MMIX-SIM <u>54</u>, 87. 86, mmixal 63. odiv: MMIX-ARITH 13, 24, 45, MMIX-PIPE 21, nor: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 138. 343, mmix-sim 13, 88, mmixal 28, 101. NORI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 86. off: MMIX-PIPE 185, 210, 213, 216, 219, normal numbers: MMIX 21. 223, 226. not a valid prefix: MMIXAL 132. offset: MMIX-IO  $\underline{21}$ , MMIX-SIM 4,  $\underline{20}$ ,  $\underline{154}$ . note\_usage: MMIX-PIPE 188, 189, 190, 196. old\_hot: mmix-pipe 60, 64, 276, 283, 310, 322, noted: MMIX-PIPE 68, 73, 75, 85, 304, 323, 328, 329, 342, 351, 353, 356, 364. mmmix 22. old\_L: mmix-sim 60, 61, 98, 132. null string...: MMIXAL 93. old\_tail: MMIX-PIPE 64, 69, 70, 74, 75, 85, nullifying: MMIX-PIPE 75, 85, 146, 147, 160, 308, 309. 310, <u>315</u>, 316. ominus: MMIX-ARITH 5, 12, 24, 47, 53, 73, 88, num: MMIX-ARITH 36, 37, 38, 39, 40, 41, 42, 89, 92, 94, 95, mmix-10 4, 12, 18, mmix-44, 46, 50, 85, 86, 88, 91, 93. PIPE 21, 139, 140, 344, MMIX-SIM 13, 85, 87, NXOR: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 86, MMIXAL 28, 49, 100, 101, 114, 126, 127, 131. MMIXAL 63. omult: MMIX-ARITH 8, 12, 43, MMIX-PIPE 21, nxor: MMIX-CONFIG 28, MMIX-PIPE 49, 343, mmix-sim <u>13</u>, 88, mmixal <u>28</u>, 101. 51, 138.  $one\_arg\_bit$ : MMIXAL <u>62</u>, 116. NXORI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 86. oo: MMIX-ARITH 31, 34, 47, 49, 50, 53, 87. nybble: MMIX 6, 11. oops: MMIX 50, MMIX-SIM 1. O: MMIX-SIM 75.oops: mmix-sim 64, 127, mmmix 10. o: mmix-arith 29, 31, 34, 47, 50, 88, mmix-IO 12, 14, 16, 19, 20, 22, MMIX-PIPE 19, 40, Oops...too long: MMOTYPE 26. OP: MMIX-CONFIG  $\underline{15}$ , 17, 24. 157, 246, MMIX-SIM 12, 15, 91, 137, 140, 154, op: mmix-pipe 44, 46, 75, 80, 81, 82, 84, 85, 160, mmixal 49, 114, 127, mmotype 8. 100, 102, 103, 108, 109, 112, 113, 114, 117, **O\_BIT**: MMIX-ARITH 31, 33, 35, MMIX-PIPE 54, 124, 139, 151, 152, 155, 156, <u>157</u>, 236, 256, MMIX-SIM 57, MMIXAL 69. 271, 279, 281, 282, 312, 320, 321, 327, 332, O\_Handler: MMIXAL 69. 339, 344, 345, 346, 348, MMIX-SIM 60, 62, oand: MMIX-ARITH 25, MMIX-PIPE 21, 241, 65, 70, 71, 78, 79, 85, 87, 89, <u>91, 92, 93,</u> MMIX-SIM <u>13</u>, MMIXAL <u>28</u>, 107. 94, 95, 123, 126, 127, 130, 131. oandn: MMIX-ARITH 25, MMIX-PIPE 21, 146, OP codes: mmix 5. 240, 241, 279, 325. OP codes, table: MMIX 51. obj\_file: MMIXAL 47, 138, 139 op\_bits: mmixal 102, 104, <u>105</u>, 107, 116, 121, obj\_file\_name: MMIXAL 47, 137, 138, 139. 122, 123, 124, 129. obj\_time: MMIX-SIM 28, 31, 44. op\_field: MMIXAL 32, 33, 102, 104, 116, 121, object files: MMIXAL 22. 122, 123, 124, 129. OCTA: MMIXAL 17, 62, 63, 117, 118. op\_info: MMIX-SIM 64, 65. octa: MMIX-ARITH 3, 4, 5, 6, 7, 8, 9, 12, 13, op\_init\_size: mmixal 63, 64. 24, 25, 29, 31, 34, 37, 38, 39, 40, 41, 44, 46, op\_init\_table: mmixal 63, 64. 47, 50, 54, 56, 69, 85, 86, 87, 88, 89, 91, 93, op\_ptr: mmixal 83, 85, 86, 98, 101. MMIX-CONFIG 31, 32, 33, 37, MMIX-IO  $\underline{3}$ , op\_root: mmixal 56, 61, 64, 80, 104. 4, 8, 11, 12, 14, 16, 18, 19, 20, 21, 22,  $OP\_size$ : MMIX-CONFIG <u>15</u>, 17, 24. 23, MMIX-MEM 1, 2, 3, MMIX-PIPE 9, 10, op\_spec: MMIX-CONFIG 14, 15, 17, <u>17</u>, 18, 19, 20, 21, 40, 44, 46, 68, 87, 90, MMIXAL <u>62</u>, 63, 64. 91, 98, 99, 141, 148, 156, 157, 167, 192, 193, 197, 201, 203, 204, 205, 206, 208, 209, op\_stack: MMIXAL 81, 82, 83, 84, 85, 86, 210, 212, 213, 216, 219, 220, 237, 238, 239, 98, 101. 240, 241, 246, 254, 255, 268, 270, 271, 278, opcode: MMIXAL 102, 104, 105, 109, 117, 118, 282, 284, 297, 372, 373, 376, 377, 378, 379, 119, 121, 124, 126, 127, 128, 129, 131, 132.

opcode\_name: MMIX-PIPE 48, 73. open: MMIX-ARITH 65. operand of 'BSPEC'...: MMIXAL 132. operand...register number: MMIXAL 129. operand\_list: MMIXAL 32, 33, 85, 86, 106. operands\_done: MMIXAL 85, 98. operating system: MMIX 2, 29, 30, 33, 35, 37, 38, 43, 44, 47, MMIX-PIPE 243. oplus: MMIX-ARITH 5, 47, 53, 73, MMIX-IO 4, MMIX-PIPE  $\overline{21}$ , 139, 140, 241, 265, 331, MMIX-SIM <u>13</u>, 60, 84, 85, 101, 154, MMIXAL <u>28</u>, 94, 99. ops: MMIX-CONFIG 18, 25, 29, MMIX-PIPE 76, 79, 82. OR: MMIX 10, MMIX-PIPE  $\underline{47}$ , 114, MMIX-SIM <u>54</u>, 86, MMIXAL 63. or: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 114, 138, mmixal <u>82,</u> 97, 101. ORH: MMIX 13, MMIX-PIPE 47, MMIX-SIM 54, 86, mmixal 63, 128. **ORI**: MMIX-PIPE 47, MMIX-SIM 54, 86, 126, 131. origin: MMIX-ARITH 63, 64, 65. ORL: MMIX 13, MMIX-PIPE 47, MMIX-SIM 54, 86, mmixal 63, 128. ORMH: MMIX 13, MMIX-PIPE 47, MMIX-SIM 54, 86, mmixal 63. ORML: MMIX 13, MMIX-PIPE 47, MMIX-SIM 54, 86, mmixal 63. ORN: MMIX 10, MMIX-PIPE 47, MMIX-SIM 54, 86, mmixal 63. orn: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 138. ORNI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 86. out\_stab: MMIXAL 74, 75, 80. outbuf: MMIX-CONFIG 31, MMIX-PIPE 167,  $176,\ 202,\ 203,\ 205,\ 215,\ 216,\ 217,\ 218,$  $219,\ 221,\ 259,\ 378,\ 379.$ outer\_lp: MMIXAL 82, 85, 98. outer\_rp: mmixal 82, 97, 98. over: MMIXAL 82, 97, 101. overflow: MMIX 8, 9, 20, 21, 22, 32. overflow: MMIX-ARITH 9, 12, 24, MMIX-PIPE 20, 21, 343, MMIX-SIM 13, 88, MMIXAL 27. owner: MMIX-PIPE 44, 46, 63, 67, 73, 81, 124, 134, 144, 145, 244, 314, 357. oxor: MMIX-ARITH 25.  $o\theta$ : MMIX-ARITH <u>34</u>. p: MMIX-ARITH 60, 61, 62, 66, 70, 82, 89, MMIX-CONFIG <u>10</u>, <u>11</u>, <u>36</u>, MMIX-IO <u>13</u>,  $\underline{14}$ ,  $\underline{16}$ , mmix-pipe  $\underline{26}$ ,  $\underline{28}$ ,  $\underline{33}$ ,  $\underline{35}$ ,  $\underline{40}$ , 63, 73, 120, 170, 172, 179, 185, 187, 189, 191, 193, 196, 199, 201, 203, 205, 251, <u>255</u>, <u>256</u>, <u>258</u>, <u>378</u>, <u>379</u>, <u>381</u>, <u>384</u>, <u>387</u>. MMIX-SIM <u>17</u>, <u>20</u>, <u>42</u>, <u>50</u>, <u>62</u>, <u>114</u>, <u>117</u>,

 $\underline{120}$ ,  $\underline{154}$ ,  $\underline{162}$ ,  $\underline{165}$ , mmixal  $\underline{40}$ ,  $\underline{50}$ ,  $\underline{57}$ ,  $\underline{59}$ ,

MMMIX  $\underline{17}$ , MMOTYPE  $\underline{1}$ .

opcode syntax error...: MMIXAL 104.

opcode...operand(s): MMIXAL 116.

P\_BIT: MMIX-PIPE 54, 81, 149, 160, 322, 331, mmmix 15. pack\_butes: MMIX-PIPE 320, 335, 341. packit: MMIX-ARITH 71, 78, 79. page coloring: MMIX-PIPE 268, 292. page fault: MMIX 37. page table entry: MMIX 45. page table pointer: MMIX 45. page\_b: MMIX-PIPE 238, 239, 243, 244, MMMIX 12, <u>25</u>. page\_bad: MMIX-PIPE 238, 239, 266, 288, MMMIX 12, 23, 25. page\_f: MMIX-PIPE 238, 239, 272, 298. page\_mask: MMIX-PIPE 238, 239, 240, 241, 279, 325, mmmix 12, 23, 25. page\_n: MMIX-PIPE 238, 239, 240, 279. page\_r: MMIX-PIPE 238, 239, 244, MMMIX 12, page\_s: MMIX-PIPE 238, 239, 243, 268, 292, MMMIX 12, 25. panic: MMIX-CONFIG 8, 10, 16, 18, 19, 20, 23, 24, 25, 29, 31, 32, 33, 34, 35, 36, 37, 38, MMIX-PIPE 13, 22, 28, 135, 185, 187, 213, mmix-sim 14, 17, 24, 41, 42, 77, 120, MMIXAL 29, 32, 38, <u>45</u>, 50, 55, 59, 84. PARITY\_ERROR: MMIX-PIPE 57. pass\_after: mmix-pipe 125, 134, 266, 268,  $270, 271, 288, 350, \overline{353}.$ pass\_data: mmix-pipe <u>134</u>, 135. passit: mmix-pipe <u>134</u>, 266, 268, 270, 271, 288, 350, 353, MMIX-SIM 161. Patterson, David Andrew: MMIX 1, MMIX-PIPE 2, 58, 150, 163. **PBEV**: MMIX 17, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 93, mmixal 63. PBEVB: MMIX-PIPE 47, MMIX-SIM 54, 93. PBN: MMIX 17, MMIX-PIPE 47, MMIX-SIM 54, 93, mmixal 63. PBNB: MMIX-PIPE 47, MMIX-SIM 54, 93. PBNN: MMIX 17, MMIX-PIPE 47, MMIX-SIM 54, 93, mmixal 63. PBNNB: MMIX-PIPE 47, MMIX-SIM 54, 93. PBNP: MMIX 17, MMIX-PIPE 47, MMIX-SIM 54, 93, mmixal 63. PBNPB: MMIX-PIPE 47, MMIX-SIM 54, 93.  ${\tt PBNZ}: \quad {\tt MMIX} \ 17, \ {\tt MMIX}\text{-}{\tt PIPE} \ \underline{47}, \ {\tt MMIX}\text{-}{\tt SIM} \ \underline{54},$ 93, mmixal 63. PBNZB: MMIX-PIPE 47, MMIX-SIM 54, 93. PBOD: MMIX 17, MMIX-PIPE 47, MMIX-SIM 54, 93, mmixal 63. PBODB: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 93. PBP: MMIX 17, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 93, mmixal 63. PBPB: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 93. pbr: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 81, 85, 106, 152, 155. PBZ: MMIX 17, MMIX-PIPE 47, MMIX-SIM 54, 93, mmixal 63. PBZB: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 93.

pcs: MMIX-CONFIG 21, 23. PREST: MMIX 30, 34, MMIX-PIPE 47, MMIXpeek\_hist: MMIX-PIPE 68, 74, 75, 85, 99, SIM 54, 106, MMIXAL 63. 100, 151, 152. prest: MMIX-PIPE 49, 51, 81, 227, 265, 269, peekahead: MMIX-CONFIG 15, MMIX-PIPE 59,  $271, 272, 273, \overline{274}, 275.$ prest\_span: MMIX-PIPE 275, 276. prest\_win: MMIX-PIPE 267, 276. performance monitoring: MMIX 40. PRESTI: MMIX-PIPE 47, MMIX-SIM 54, 106. permission bits: MMIX 37, 46. print\_bits: MMIX-PIPE 46, <u>55</u>, <u>56</u>, 73. phys\_addr: MMIX-PIPE 240, 241, 269, 292, print\_cache: MMIX-PIPE 175, 176, MMMIX 21. 295, 298. print\_cache\_block: MMIX-PIPE 171, 172, 177. physical addresses: MMIX 44, 45, 47. print\_cache\_locks: MMIX-PIPE 39, 173, 174. pipe\_bit: MMIX-PIPE 8, 10. print\_control\_block: MMIX-PIPE 45, 46, 63, pipe\_limit: MMIX-CONFIG 24, MMIX-PIPE 136. 81, 125, 145, 146, 147. pipe\_seq: MMIX-CONFIG 17, 24, 27, MMIXprint\_coroutine\_id: MMIX-PIPE 24, 25, 28, 33, PIPE 133, 134, 136, 141. 63, 73, 81, 125, 145. pixels: MMIX 11. print\_fetch\_buffer: MMIX-PIPE 72, 73, 253. plus: MMIXAL 82, 97, 99. print\_float: MMIX-ARITH 54, 59, MMIX-SIM 13, policy: MMIX-PIPE 186, 187, 189, 191. 137, 159. Pool\_Segment: MMIX-SIM 3, 6, 37, 163, print\_freqs: MMIX-SIM 50, 53. mmixal 69. print\_hex: MMIX-SIM 12, 137, 138, 159. POP: MMIX 29, MMIX-PIPE 47, MMIX-SIM 54, print\_int: MMIX-SIM 15, 137, 159. 101, MMIXAL 63. print\_line: MMIX-SIM 45, 47. pop: MMIX-CONFIG 28, MMIX-PIPE 46, 49, print\_locks: MMIX-PIPE 10, 38, 39, MMMIX 21. 51, 85, 114, 120, 331. *print\_octa*: MMIX-PIPE  $18, \overline{19}, \overline{43}, 46, 73, 91,$ pop\_unsave: MMIX-PIPE 120, 332. 146, 149, 152, 160, 176, 251, 283, 310, population counting: MMIX 12. 314, 319, 320, 321. ports: MMIX-CONFIG 16, 23, 34, MMIX*print\_pipe*: MMIX-PIPE 10, <u>252</u>, <u>253</u>, MMMIX 21. PIPE 128, <u>167</u>, 183. print\_reorder\_buffer: MMIX-PIPE 62, 63, 253. postamble: MMIX-SIM 25, 29, 32, MMOTYPE 1, print\_spec: MMIX-PIPE 42, 43, 46. print\_specnode: MMIX-PIPE 43, 46. power-saver mode: MMIX 31. print\_specnode\_id: MMIX-PIPE 43, 73, 90, 91. POWER\_FAILURE: MMIX-PIPE 57. print\_stab: MMOTYPE 25, 26. power\_of\_two: MMIX-CONFIG 12, 13, 20, 23. *print\_stats*: MMIX-PIPE <u>161</u>, <u>162</u>, MMMIX 2, 21. pp: mmix-arith  $\underline{61}$ , 80,  $\underline{81}$ , mmix-pipe 184, print\_string: MMIX-SIM 159, 160. 185, MMIXAL 59, 64, 65, 66, 70, 74, 78, 87, print\_trip\_warning: MMIX-IO 23, MMIX-104, 109, 110, 111, 112, 118, 125, 130. PIPE 373, 376, MMIX-SIM 109, 113 ppol: MMIX-CONFIG 22, 23. print\_write\_buffer: MMIX-PIPE 250, 251, 253. PR\_BIT: MMIX-PIPE 54, 266, 269. printf: MMIX-ARITH 54, 55, 57, 67, MMIXprec: MMIXAL <u>82</u>, 83. MEM 2, 3, MMIX-PIPE 10, 19, 25, 28, 33, precedence: MMIXAL 83, 85. 39, 43, 46, 56, 63, 73, 81, 91, 125, 145, predef\_size: MMIXAL 69, 70. 146, 147, 149, 152, 160, 162, 172, 174, 176, predef\_spec: MMIXAL 68, 69, 70. 177, 251, 283, 310, 314, 319, 320, 321, 387, PREDEFINED: MMIXAL <u>58</u>, 64, 66, 70, 87, 109. MMIX-SIM 12, 15, 45, 47, 49, 51, 53, 82, predefined symbols: MMIXAL 10, 67, 69. 83, 103, 105, 120, 128, 130, 131, 132, 133, predefs: MMIXAL 69, 70. 137, 138, 140, 143, 149, 150, 159, 160, 162, predicted: MMIX-PIPE 85, 151. MMMIX 2, 13, 14, 15, 18, 19, 22, 23, 24, PREFIX: MMIXAL 16, 62, 63, 129, 132. MMOTYPE 9, 15, 19, 21, 23, 24, 25, 28, 30. PREGO: MMIX 30, MMIX-PIPE 47, 235, MMIXpriority: MMIX-SIM 17, 19, 21. SIM <u>54</u>, 106, MMIXAL 63. privileged instructions: MMIX 37. privileged operations: MMIX 31, 33, 37, 43, 44. prego: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 81, 227, 265, 271, 288, 289, 294, 296, privileged\_inst: MMIX-PIPE 118, 355, MMIX-298, 300, 301. SIM 60, 94, 95, 97, <u>107</u>, 108, 109. PREGOI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ ,  $\underline{106}$ .  $profile\_gap: MMIX-SIM \underline{48}, 53, 143.$ PRELD: MMIX 30, MMIX-PIPE 47, MMIX-SIM 54, profile\_showing\_source: MMIX-SIM 48, 53, 143. 106, MMIXAL 63. profile\_started: MMIX-SIM 51, 52. preld: mmix-pipe 49, 51, 81, 227, 265, 266, profiling: MMIX-SIM 141, 143, <u>144</u>. 269, 271, 272, 273, 274. prog\_file: MMMIX 4, 5, 6, 9, 10. PRELDI: MMIX-PIPE 47, MMIX-SIM 54, 106. proq\_file\_name: MMMIX 2, 3, 4, 6, 9, 10, 11. Premature end of file...: MMMIX 10. program counter: MMIX-PIPE 284.

PROT\_OFFSET: MMIX-PIPE 54, 269, 293, 298. r: mmix-arith 31, 34, 62, 86, 88, 89, 91, protection bits: MMIX 37, 46. MMIX-PIPE 35, 93, 95, 189, 191, MMIXprotection fault: MMIX 45. SIM 15, 22. rA: MMIX 21, 22, 32, 38. prototypes for functions: MMIX-ARITH 2, rA: MMIX-PIPE 52, 107, 108, 146, 324, 329, MMIX-PIPE 6. 334, 342, mmix-sim 55, 72, 97, 103, 105, prts: MMIX-CONFIG 13, 15, 23. 122, 131, 151, 158. prune: MMIXAL 73, 80. ra: MMIX-PIPE 44, 46, 59, 100, 108, 131, 144, PRW\_BITS: MMIX-PIPE 266, 269.  $307, 308, 32\overline{4}, 346.$ pseudo\_lru: MMIX-CONFIG 22, MMIX-PIPE 164, radix conversion: MMIX-ARITH 54, 68, 186, 187, 189, 191. MMMIX 17. pseudo\_op: MMIXAL 62. random: MMIX-CONFIG 16, 22, MMIX-PIPE 7, pst: mmix-pipe 49, 51, 117, 254, 265, 266, 164, 167, 186, 187. 271, 280, 321, 357. rank: MMIX-PIPE 167, 172, 186, 187, 188, PTE: MMIX 45, 47. 189, 191. PTP: MMIX 45, 47. rB: mmix 35. ptr\_a: MMIX-CONFIG 16, MMIX-PIPE 44, 114, rB: MMIX-PIPE <u>52</u>, 86, 310, 312, 319, MMIX-117, 215, 217, 222, 224, 227, 236, 237, 249, SIM 55, 72,  $1\overline{02}$ , 104, 123, 151. 254, 255, 325, 326, 333, 334. rBB: mmix 36, 38. ptr\_b: MMIX-PIPE 44, 217, 218, 222, 224, 225, rBB: MMIX-PIPE <u>52</u>, 312, 319, 322, 372, 380, 232, 233, 234, 237, 257, 261, 262, 272, MMIX-SIM <u>55</u>, 108, 151. 274, 298, 300, 326. rC: MMIX 45. ptr\_c: mmix-pipe 44, 224, 225, 236, 237. rC: MMIX-PIPE 52, 269, MMIX-SIM 55, 151. pure: MMIXAL 82, 87, 94, 99, 100, 101, 110, rD: mmix 20. 116, 124, 129. rD: MMIX-PIPE 52, 107, MMIX-SIM 55, 66, 151. push: MMIX-SIM 101. rE: mmix 25. push\_pop\_bit: MMIX-SIM 65, 132. rE: MMIX-PIPE <u>52</u>, 107, 108, MMIX-SIM <u>55</u>, PUSHGO: MMIX 29, MMIX-PIPE 47, MMIX-66, 151. SIM 54, 101, MMIXAL 63. read\_bit: MMIX-SIM 58, 83, 161, 162. pushqo: MMIX-CONFIG 28, MMIX-PIPE 49, read\_byte: MMIX-SIM 27, MMOTYPE 10, 26, 51, 85, 110, 119, 331. 27, 28, 30. PUSHGOI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ ,  $\underline{101}$ . read\_hex: MMIX-MEM 1, 2, MMMIX 15, PUSHJ: MMIX 29, MMIX-PIPE 47, MMIX-SIM 54, 17, 18, 22. 101, mmixal 63. read\_tet: MMIX-SIM 26, 27, 28, 29, 33, 34, pushj: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 35, 36, 37, MMOTYPE 9, 10, 13, 18, 19, 85, 110, 119, 327. 20, 21, 23, 24, 25, 30. PUSHJB: MMIX-PIPE 47, MMIX-SIM 54, 101. reader: MMIX-CONFIG 34, MMIX-PIPE 128, 167, PUT: MMIX 43, MMIX-PIPE 47, MMIX-SIM 54, 183, 233, 257, 266, 267, 271, 272, 273, 288, 97, mmixal 63.  $291,\ 296,\ 353,\ 354,\ 358,\ 359,\ 360,\ 365,\ 366.$ put: MMIX-CONFIG 28, MMIX-PIPE 49, 51, ready: MMIX-SIM 150. 118, 146, 149, 329. REBOOT\_SIGNAL: MMIX-PIPE 57. PUTI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 97, recycle\_fixup: MMIXAL 59, 112. MMMIX 12. redefinition...: MMIXAL 109. PV: MMIX-CONFIG <u>15</u>, 17, 20. reg\_val: MMIXAL 82, 87, 98, 99, 100, 101, 109, PV\_size: MMIX-CONFIG 15, 17, 20. 110, 118, 121, 122, 123, 124, 129. pv\_spec: MMIX-CONFIG 12, 15, 17. REGISTER: MMIXAL <u>58</u>, 74, 78, 87, 109, 110. PW\_BIT: MMIX-PIPE 54, 266, 269. register number...: MMIXAL 98, 118. register stack: MMIX 29, 42, 43.

register\_truth: MMIX-PIPE 155, 156, 157, 345, PX\_BIT: MMIX-PIPE 54, 269, 293, 298, 301. q: MMIX-ARITH 13, 24, 60, 61, 62, 70, 82, MMIX-CONFIG 10, MMIX-PIPE 35, 196, 205, MMIX-SIM <u>91</u>, 92, 93.  $\underline{255},\ \underline{256},\ \underline{258},\ \underline{378},\ \underline{379},\ \text{mmix-sim}\ \underline{21},$ registerize: MMIXAL 82, 86, 100.  $\text{mmixal } \underline{40}.$ rel\_addr\_bit: mmix-pipe 75, 83, 106, mmixqhat: mmix-arith 13, 20, 21, 22, 23. SIM 60, <u>65</u>, MMIXAL <u>62</u>, 124, 129. qloop: MMIX-PIPE 255. relative address...: MMIXAL 114, 126, 131. qq: MMIX-ARITH <u>61</u>, <u>62</u>, MMIXAL <u>65</u>, 112, release\_lock: MMIX-PIPE <u>37</u>, 222, 226, 233,  $113,\ 114,\ 118,\ 125,\ 130.$ 234, 272, 298, 356. quantify\_mul: MMIX-PIPE 343. ren\_a: MMIX-PIPE 44, 46, 100, 111, 117,

119, 121, 123, 144, 145, 146, 147, 312,

322, 334, 340.

queuelist: MMIX-PIPE 34, 35, 125.

quiet NaN: MMIX 21.

rename registers: MMIX-PIPE 44, 86.

 $\begin{array}{ll} \textit{reorder\_bot:} & \textit{mmix-config 37, mmix-pipe } \underline{60}, \\ 63, \, 67, \, 75, \, 145, \, 159, \, 318, \, 357, \, \textit{mmmix 18}. \end{array}$ 

reorder\_buf\_size: MMIX-CONFIG <u>15</u>, 37.

reorder\_top: MMIX-CONFIG 37, MMIX-PIPE 60, 61, 63, 67, 75, 145, 159, 318, 357, MMMIX 18. repeating: MMIX-SIM 149, 152, 153, 156.

repl: MMIX-CONFIG 16, 23, MMIX-PIPE 167, 196, 199, 205.

**replace\_policy**: MMIX-CONFIG 22, MMIX-PIPE <u>164</u>, 167, 186, 187, 188, 189, 190, 191. report\_error: MMIXAL <u>45</u>.

res: MMIX-PIPE 93.

resum: MMIX-PIPE 49, 67, 314, 323, 325.

RESUME: MMIX 38, 49, 50, MMIX-PIPE 47, 304, 323, MMIX-SIM 54, 60, 124, 125, 130, MMIXAL 63, MMMIX 12.

resume: MMIX-CONFIG 28, MMIX-PIPE <u>49</u>, 51, 85, 149, 322, 323, 325.

RESUME\_AGAIN: MMIX-PIPE <u>320</u>, 323, MMIX-SIM 71, 125, 130, 164.

resume\_again: MMIX-PIPE 323.

RESUME\_SET: MMIX-PIPE 307, 320, 323, 324, MMIX-SIM 122, 125, 126, 130.

resume\_simulation: MMIX-SIM 149.

resuming: MMIX-PIPE 73, 78, 81, 103, 160, 308, 309, 316, 323, 324, MMIX-SIM 60, 61, 71, 125, 127, 130, 141, 164.

Reuter, Andreas Horst: MMIX 31.

reversed: MMIX-PIPE 152.

rewind: MMIX-CONFIG 19, 38.

rF: mmix 48.

rF: MMIX-PIPE 52, MMIX-SIM 55, 151.

rf: MMIX-ARITH 91, 92.

rG: mmix 29, 39.

rG: MMIX-PIPE  $\underline{52}$ , 89, 102, 329, 330, 334, 342, MMIX-SIM  $\underline{55}$ , 97, 104, 105, 151, 158.

rH: mmix 20.

rH: MMIX-PIPE <u>52</u>, 121, MMIX-SIM <u>55</u>, 88, 151.

 $rhat\colon \ \text{mmix-arith}\ \underline{13},\ 21.$ 

rhs: mmix-sim 96, 97, 98, 108, 133, 139.

rI: MMIX 40, MMIX-SIM 1.

rI: MMIX-PIPE  $\underline{52}$ , 314, MMIX-SIM  $\underline{55}$ , 93, 127, 151, 158, MMMIX 21.

right: MMIX-SIM <u>16</u>, 21, 22, 50, 162, 165, MMIXAL <u>54</u>, 57, 72, 73, 74.

 $right\_paren$ : MMIX-SIM 138,  $\underline{139}$ .

ring: MMIX-CONFIG 36, MMIX-PIPE 26, 28, 29, 34, 35.

ring of local registers: MMIX 42, 43.

ring\_size: MMIX-CONFIG 36, MMIX-PIPE 26, 27, 28, 29, 125.

rJ: mmix 19, 29, 35.

rJ: MMIX-PIPE <u>52</u>, 85, 107, 119, 312, 319, MMIX-SIM <u>55</u>, 69, 101, 123, 151.

rK: mmix 36, 37, 38.

rK: mmix-pipe <u>52</u>, 149, 314, 317, 322, 328, mmix-sim <u>55</u>, 77, 151, mmmix 12, 15, 23.

rL: mmix 29, 39, 43.

rL: MMIX-PIPE  $\underline{52}$ , 102, 112, 114, 119, 120, 329, 330, 334, 338, MMIX-SIM 37,  $\underline{55}$ , 81, 97, 101, 102, 104, 151, 158.

rl: MMIX-PIPE <u>44</u>, 46, 100, 112, 114, 119, 120, 123, 145, 146, 147, 334, 338.

rM: mmix 10.

rM: MMIX-PIPE  $\underline{52}$ , 107, MMIX-SIM  $\underline{55}$ , 69, 151.

rN: mmix 41.

rN: MMIX-PIPE <u>52</u>, 89, MMIX-SIM <u>55</u>, 77, 151.

rO: mmix 42, 43.

rO: MMIX-PIPE <u>52</u>, 98, 118, MMIX-SIM <u>55</u>, 101, 102, 104, 151, MMMIX 19.

Robertson, James Evans: MMIX 40.

rop: MMIX-SIM 61, 71, 125, 126, 130, 164.

ropcodes: MMIX 38, 47, 49.

Rossmanith, Peter: MMIX-ARITH 26.

ROUND\_CURRENT: MMIXAL 14, 69.

ROUND\_DOWN: MMIX 28, MMIX-ARITH <u>30</u>, 33, 35, 46, 87, MMIX-PIPE <u>346</u>, MMIX-SIM <u>100</u>, 133, MMIXAL 14, 69.

round\_mode: MMIX-SIM <u>61</u>, 89, 133, 138.

ROUND\_NEAR: MMIX 28, MMIX-ARITH <u>30</u>, 33, 35, 84, 87, MMIX-PIPE <u>346</u>, MMIX-SIM 77, <u>100</u>, 133, 158, MMIXAL <u>14</u>, 69.

ROUND\_OFF: MMIX 28, MMIX-ARITH 30, 33, 35, 39, 46, 87, 94, MMIX-PIPE 346, MMIX-SIM 100, 133, MMIXAL 14, 69.

ROUND\_UP: MMIX 28, MMIX-ARITH <u>30</u>, 33, 35, 87, MMIX-PIPE <u>346</u>, MMIX-SIM <u>100</u>, 133, MMIXAL 14, 69.

rounding modes: MMIX 21, 32.

rP: mmix 31.

rP: MMIX-PIPE <u>52</u>, 283, 335, 341, MMIX-SIM <u>55</u>, 96, 102, 104, 151.

rQ: mmix 37, 40, 43.

rQ: mmix-pipe <u>52</u>, 146, 149, 310, 314, 328, 329, mmix-sim <u>55</u>, 151, mmmix <u>12</u>.

rR: MMIX 20.

rR: MMIX-PIPE  $\underline{52}$ , 121, 335, 341, MMIX-SIM 55, 88,  $1\overline{02}$ , 104, 151.

rr: MMIX-CONFIG  $\underline{22}$ .

rS: MMIX 42, 43.

*rS*: MMIX-PIPE <u>52</u>, 98, 118, MMIX-SIM <u>55</u>, 82, 83, 101, 102, 103, 104, 105, 151, MMMIX 19.

rT: mmix 36.

rT: MMIX-PIPE  $\underline{52}$ , 122, 310, 312, 372, MMIX-SIM  $\underline{55}$ , 77, 151, MMMIX 12.

rt\_op: mmixal <u>83</u>, 85, 97, 98.

rTT: mmix 37.

rTT: MMIX-PIPE 52, 314, MMIX-SIM 55, 77, scan\_close: MMIXAL 85, 98. 151, mmmix 12. scan\_const: mmix-arith 68, 69, mmixrU: mmix 40, mmix-sim 1. SIM 13, 153. rU: MMIX-PIPE <u>52</u>, 100, 146, MMIX-SIM <u>55</u>, scan\_eql: MMIX-SIM 153. scan\_hex: MMIX-SIM 152, 153, 154, 155, 161. 127, 140, 151. running times, approximate: MMIX 50. scan\_open: mmixal 86. rV: mmix 44, 45, 47. scan\_option: MMIX-SIM 142, 143, 149. rV: MMIX-PIPE <u>52</u>, 329, MMIX-SIM <u>55</u>, 77, scan\_string: MMIX-SIM 153, 155. 151, mmmix 12. scan\_type: MMIX-SIM 152, 153. schedule: MMIX-PIPE 27, 28, 31, 125, 326, 368. rv: MMIX-PIPE 239. rW: mmix 34, 38. schedule\_bit: MMIX-PIPE 8, 10, 28, 33. rW: MMIX-PIPE <u>52</u>, 320, 322, 373, MMIX-Schwoon, Stefan: MMIX-ARITH 26. SIM 55, 109, 123, 124, 151. Sclean: MMIX-PIPE 234. rWW: MMIX 36, 38. Sclean\_inc: MMIX-PIPE 234. rWW: MMIX-PIPE <u>52</u>, 320, 322, 373, MMIX-Sclean\_loop: MMIX-PIPE 234. SIM 55, 108, 151. sclock: MMIX-SIM 19, 93, 127, 140. rX: mmix 34, 37, 38. security violation: MMIX 37. rX: MMIX-PIPE 52, 320, 322, MMIX-SIM 55, security\_disabled: MMIX-CONFIG 15, MMIX-60, 123, 124, 126, 151, 164. PIPE 66, 67. rXX: mmix 36, 38. Sedgewick, Robert: MMIXAL 54. rXX: MMIX-PIPE 52, 320, 322, 372, MMIX-SEEK\_END: MMIX-IO  $\underline{2}$ , 21, MMIX-SIM 4. SIM 55, 108, 151. SEEK\_SET: MMIX-IO  $\overline{2}$ , 21, MMIX-SIM 4, 45,  $\underline{46}$ . segments: MMIX 44, 45, 47, MMMIX 9. rY: mmix 34, 38. rY: MMIX-PIPE <u>52</u>, 321, 324, MMIX-SIM <u>55</u>, Seidel, Raimund: MMIX-SIM 16. 123, 126, 151. self: MMIX-PIPE 124, 125, 134, 215, 217, 222, 224, 225, 226, 233, 234, 237, 257, 259, 260, rYY: mmix 36, 38. 261, 262, 264, 266, 272, 274, 279, 298, 300, rYY: MMIX-PIPE <u>52</u>, 321, 323, 324, MMIX-SIM <u>55</u>, 108, 151. 301, 310, 350, 356, 358, 359, 360, 361, 362, rZ: MMIX 34, 38. 364, 365, 366, 367, 368. sentinel: MMIX-PIPE 35, 36, 125. rZ: MMIX-PIPE <u>52</u>, 321, 324, 335, 339, MMIX-SIM <u>55</u>, 102, 103, 104, 105, 123, 126, 151. serial: MMIX-CONFIG 22, MMIX-PIPE 164, 186, rZZ: MMIX 36, 38. 187, 189, 191, MMIXAL 58, 59, 73, 74, 75, rZZ: MMIX-PIPE <u>52</u>, 321, 323, 324, MMIX-78, 100, 109, 112, 114, 118, 125, 130.  ${\rm SIM} \ \underline{55}, \ 108, \ 151.$ serial number: MMIXAL 11, 21.  $S: \text{ MMIX-SIM } \underline{76}.$ serial\_number: MMIXAL 59, 60, 109. s: mmix-arith 7, 31, 34, 37, 38, 39, 40, 50, serialize: MMIXAL 59, 82, 86, 100. 66, 68, 89, mmix-10 14, 16, mmix-mem 1, SET: MMIX 10, MMIXAL 13, 62, 63, 124. MMIX-PIPE 21, 28, 43, 133, 134, 187, 189, set: mmix-config 28, 32, mmix-pipe 49, 51, 191, 193, 196, 205, 385, MMIX-SIM 13, 118 109, 137, 167, 177, 181, 192, 233, 234, 154, MMIXAL 28, 41, 57. 343, MMMIX 12, 23. set\_l: mmix-pipe 44, 46, 100, 112, 114, 119, **S\_BIT**: MMIX-PIPE 54, 149. S\_non\_miss: MMIX-PIPE 224.  $120, 123, 145, \overline{146}, 147, 334, 338.$ SADD: MMIX 12, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , set\_lock: MMIX-PIPE 37, 81, 215, 217, 219, 87, MMIXAL 63. 222, 224, 225, 226, 233, 234, 237, 259, sadd: mmix-config 15, 28, mmix-pipe 49, 260, 261, 262, 264, 271, 272, 274, 276, 277, 297, 298, 300, 310, 358, 359, 360, 361, 51, 344. SADDI: MMIX-PIPE 47, MMIX-SIM 54, 87. 362, 365, 366, 367, 368. set\_round: MMIX-PIPE 281, 346. Satterthwaite, Edwin Hallowell, Jr.: MMIXset\_type: MMIX-SIM 153. SIM 131. saturating arithmetic: MMIX 11. SETH: MMIX 13, MMIX-PIPE 47, 112, 323, sav: mmix-pipe 49, 327, 337. MMIX-SIM <u>54</u>, 71, 85, MMIXAL 63, <u>128</u>. SAVE: MMIX 43, 50, MMIX-PIPE 47, 81, 281, SETL: MMIX 13, MMIX-PIPE 47, MMIX-SIM 54, 341, mmix-sim <u>54</u>, 102, mmixal 63. 85, mmixal 63, <u>128</u>. SETMH: MMIX 13, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , save: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 327, 337, 340. 85, MMIXAL 63. Scache: MMIX-CONFIG 17, 21, 35, 36, MMIX-SETML: MMIX 13, MMIX-PIPE 47, MMIX-SIM 54, PIPE 39, 168, 215, 217, 218, 219, 220, 221, 85, MMIXAL 63.

setsz: mmix-config 13, 15, 23.

seven\_octa: mmmix 23, 25.

222, 224, 225, 226, 234, 261, 274, 300, 360,

364, 367, 378, 379, mmmix 21.

sfile: MMIX-IO 6, 7, 8, 10, 11, 12, 13, 14, 15, signed\_odiv: MMIX-ARITH 24, MMIX-PIPE 21, 16, 17, 18, 19, 20, 21, 22, 23. 343, MMIX-SIM 13, 88. SFLOT: MMIX 27, 28, MMIX-PIPE 47, MMIXsigned\_omult: MMIX-ARITH 12, MMIX-PIPE 21, SIM <u>54</u>, 89, MMIXAL 63. 343, MMIX-SIM 13, 88. sim: mmix-pipe 21, mmix-sim 13. SFLOTI: MMIX-PIPE 47, MMIX-SIM 54, 89. SFLOTU: MMIX 27, 28, MMIX-PIPE 47, MMIXsim\_file\_info: MMIX-IO 5, 6. Singh, Balbir: MMIX-ARITH 26. SIM 54, 89, MMIXAL 63. Sites, Richard Lee: MMIX 3, 40. SFLOTUI: MMIX-PIPE 47, MMIX-SIM 54, 89. size: MMIX-IO  $\underline{4}$ ,  $\underline{12}$ , 13,  $\underline{14}$ , 15,  $\underline{16}$ , 17,  $\underline{18}$ , MMIX-MEM 1,  $\underline{2}$ ,  $\underline{3}$ , MMIX-PIPE  $\underline{208}$ ,  $\underline{246}$ , sfpack: MMIX-ARITH 34, 39, 40, 90. sfunpack: MMIX-ARITH 38, 39, 90.  $256, 260, \underline{381}, \underline{384}, \text{mmix-sim } 4, \underline{114}, \underline{117}.$ sh: MMIX-CONFIG 15, 28, MMIX-PIPE 49, 141. SL: MMIX 14, MMIX-PIPE 47, MMIX-SIM 54, sh\_check: mmixal 97. 87, mmixal 63. shift\_amt: MMIX-PIPE 141, MMIX-SIM 87. sleep: MMIX-PIPE 125, 224, 257, 272, 274, shift\_left: MMIX-ARITH 7, 31, 34, 37, 38, 43, 298, 300, 301. 45, 47, 49, 51, 52, 53, 55, 63, 73, 87, 88, 89, sleepy: MMIX-PIPE 301, 302, 303. 92, 94, 95, MMIX-MEM  $\underline{1}$ , 2, MMIX-PIPE  $\underline{21}$ , SLI: MMIX-PIPE 47, MMIX-SIM 54, 87. 22, 113, 114, 118, 139, 141, 244, 279, 282, SLU: MMIX 14, MMIX-PIPE 47, MMIX-SIM 54, 333, 339, MMIX-SIM 13, 14, 85, 87, 94, 95, 87, MMIXAL 63. 154, 155, mmixal 28, 29, 94, 95, 101. SLUI: MMIX-PIPE 47, MMIX-SIM 54, 87.  $shift\_right$ : MMIX-ARITH  $\underline{7}$ , 29, 31, 33, 39, sl3: mmmix 19, 20. 45, 47, 49, 51, 53, 63, 87, 88, 89, 94, Sorry, I can't open...: MMIX-SIM 145, 146. mmix-mem  $\underline{1}$ , 3, mmix-pipe  $\underline{21}$ , 141, 239, source: mmixal 126, 131. 243, 279, 282, 334, 343, MMIX-SIM 13, 87, **spec**: MMIX-PIPE <u>40</u>, 41, 42, 43, 44, 92, 93, 284. 94, 95, mmixal 28, 101, 114, 126, 131. spec\_bit: mmixal 62, 102. shl: MMIX-PIPE 49, 51, 141, MMIXAL 82, spec\_install: MMIX-PIPE 94, 95, 110, 112, 113, 97, 101.  $114,\ 117,\ 118,\ 119,\ 12\overline{0},\ 121,\ 312,\ 322,\ 333,$ shlu: MMIX-PIPE 49, 51, 141. 334, 338, 339, 340, 355. short float: MMIX 26, 27. spec\_mode: mmixal 43, 44, 52, 102, 132. show\_breaks: MMIX-SIM 161, 162 spec\_mode\_loc: mmixal 43, 52, 132. show\_line: MMIX-SIM 47, 50, 51, 82, 83, spec\_read: MMIX-MEM 1, 2, MMIX-PIPE 206, 103, 105, 128. <u>208</u>, 271.  $show\_pred\_bit$ : MMIX-PIPE 8, 46, 152, 160. spec\_reg\_code: mmix-sim <u>151</u>, 152. show\_spec\_bit: MMIX-MEM 2, 3, MMIX-PIPE 8. spec\_regg\_code: MMIX-SIM 151, 152. show\_stats: MMIX-SIM 128, 140, 141, 149. spec\_rem: mmix-pipe 96, 97, 123, 145, 146, show\_wholecache\_bit: MMIX-PIPE 8, 177. 147, 256. showing\_source: MMIX-SIM 48, 49, 51, 53, spec\_write: mmix-mem 1, 3, mmix-pipe 206, 128, 143. 208, 246, 260. showing\_stats: MMIX-SIM 128, 129, 141, 143. special registers: MMIX 39, 43. shown\_file: MMIX-SIM 47, 48, 49, 53. special\_name: MMIX-PIPE 53, 91, MMIX-SIM 56,  $shown\_line\colon \ \, \text{mmix-sim} \,\, 47, \, \underline{48}, \, 49, \, 53, \, 128.$ 103, 105, 138, mmixal 66, 67. shr: MMIX-PIPE  $\underline{49}$ , 51, 141, MMIXAL  $\underline{82}$ , special\_reg: MMIX-SIM 55. 97, 101. specnode: MMIX-CONFIG 37, MMIX-PIPE 40, shrt: MMIX-PIPE 21, MMIX-SIM 13. 43, 44, 71, 86, 92, 93, 94, 95, 96, 97, 100,shru: MMIX-PIPE 49, 51, 141. 115, 120, 255, mmmix 23. SIGINT: MMIX-SIM 147, 148. specnode\_struct: MMIX-PIPE 40. sign: mmix-arith 68, 70, 73, 84. specval: mmix-pipe 92, 93, 104, 105, 106,  $\mathit{sign\_bit} \colon \ \mathsf{MMIX}\text{-}\mathsf{ARITH}\ \underline{4},\ 12,\ 24,\ 33,\ 35,\ 37,$ 108, 113, 114, 118, 120, 122, 312, 322, 38, 39, 40, 41, 44, 46, 54, 87, 89, 91, 323, 324, 339. 93, mmix-config 32, 33, mmix-io 21, speed\_lock: MMIX-PIPE 39, 247, 257, 362. MMIX-PIPE 80, 81, 82, 85, 89, 91, 100, 113, Sprep: mmix-pipe 233, <u>234</u>. 118, 119, 140, 143, 144, 149, 157, 160, 177, sprintf: mmix-sim 24, 45, 80, 101, mmixal 45, 179, 205, 230, 233, 234, 244, 266, 271, 279, 138, MMOTYPE 28. 288, 296, 320, 322, 331, 346, 353, 354, 355, square\_one: MMIX-PIPE 272, 369, 370. 364, 368, mmix-sim <u>15</u>, 84, 85, 89, 90, 91, SR: MMIX 14, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 94, 95, 108, 123, 124, 127, 157, 159, 161. 87, MMIXAL 63. signal: MMIX-SIM 147, 148. src\_file: MMIX-SIM 42, 45, 47, 48, 49, signaling NaN: MMIX 21. MMIXAL 34, 35, 138, 139. signed integers: MMIX 6, 7. src\_file\_name: MMIXAL 137, 138, 139, 140.

SRI: MMIX-PIPE 47, MMIX-SIM 54, 87. STBI: MMIX-PIPE 47, MMIX-SIM 54, 95. SRU: MMIX 14, MMIX-PIPE 47, MMIX-SIM 54, STBU: MMIX 8, MMIX-PIPE 47, 281, MMIX-87, mmixal 63. SIM 54, 95, MMIXAL 63. **SRUI**: MMIX-PIPE <u>47</u>, MMIX-SIM <u>54</u>, 87. STBUI: MMIX-PIPE 47, MMIX-SIM 54, 95. sscanf: MMIX-CONFIG 38, MMIX-SIM 143, STCO: MMIX 8, MMIX-PIPE 47, 117, 256, MMIXAL 137, MMMIX 7, 8, 15, 18, 19, 21. MMIX-SIM 54, 95, MMIXAL 63. st: mmix-config 27, 28, mmix-pipe 49, 51, STCOI: MMIX-PIPE 47, MMIX-SIM 54, 95. 117, 254, 265, 266, 267, 270, 271, 272, StdErr: MMIX-SIM 4, 134, 137, MMIXAL 69. 279, 280, 321, 327. stderr: MMIX-CONFIG 8, MMIX-IO 7, MMIXst\_mtime: MMIX-SIM 44. PIPE 13, 381, 384, MMIX-SIM 4, 14, 24, 26,  $st\_ready$ : MMIX-PIPE 267, 270, 271, 272, 280. 35, 44, 49, 143, 145, 146, MMIXAL 35, 45, 79, stab\_start: mmotype 25, 29, 30. 137, 142, 145, MMMIX 3, 6, 7, 8, 9, 10, 11, Stack overflow: MMIX 45. 12, MMOTYPE 2, 3, 9, 14, 20, 23, 25, 26, 30. stack pointer: MMIXAL 18. StdIn: MMIX-SIM 4, 134, 137, MMIXAL 69. stack\_alert: MMIX-PIPE 44, 100, 113, 146, 269. stdin: MMIX-IO 7, 10, 13, 15, 17, MMIX-MEM 2, stack\_load: MMIX-SIM 83, 101, 104. MMIX-PIPE 387, MMIX-SIM 4, 120, 150, stack\_op: MMIXAL 82, 83, 84. MMMIX 13. STACK\_OVERFLOW: MMIX-PIPE 57, 146. StdIn>: MMIX-PIPE 387, MMIX-SIM 120. stack\_overflow: MMIX-PIPE 146, 148. stdin\_buf: MMIX-PIPE 387, 388, MMIX-Stack\_Segment: MMIX-SIM 3, 37, MMIXAL 69. SIM 120, <u>121</u>.  $stack\_store$ : MMIX-SIM 81, 82, 83, 101, 102, 103. stdin\_buf\_end: MMIX-PIPE 387, 388, MMIXstack\_tracing: MMIX-SIM 61, 82, 83, 103, SIM 120, <u>121</u>. 105, 143. stdin\_buf\_start: mmix-pipe 387, 388, mmixstage: mmix-config 26, 34, 35, 36, mmix-SIM 120, 121. PIPE 23, 25, 26, 28, 39, 59, 124, 125, 126, stdin\_chr: MMIX-IO 4, 13, 15, 17, MMIX-128, 129, 134, 136, 174, 231, 236, 249, 284. PIPE 377, 387, MMIX-SIM 120. stages: mmix-config 27, 28, 29. StdOut: MMIX-SIM 4, 134, 137, MMIXAL 69. stall: MMIX-PIPE 75, 82, 101, 102, 111, 120, stdout: MMIX-IO 7, MMIX-PIPE 387, MMIX-312, 322, 332. SIM 4, 120, 133, 137, 138, 150, 156, 159, stamp: mmix-pipe 246, 251, 256, 257, mmixmmmix 13. SIM 16, 17, 21. STHT: MMIX 8, MMIX-PIPE 47, 281, MMIXstandard floating point conventions: MMIX 22. SIM <u>54</u>, 95, MMIXAL 63.  $standard\_NaN$ : MMIX-ARITH  $\underline{4}$ , 41, 44, STHTI: MMIX-PIPE 47, MMIX-SIM 54, 95. 46, 91, 93. sticky bit: MMIX-ARITH 31, 34, 49, 53, 79, 87. start\_fetch: MMIX-PIPE 288, 289. STO: MMIX 8, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , start\_ld\_st: mmix-pipe 265. 95, mmixal 63. startup: MMIX-PIPE 30, 31, 81, 203, 219, 221, STOI: MMIX-PIPE 47, MMIX-SIM 54, 95. 225, 233, 244, 249, 257, 259, 260, 261, 266, Stone, Harold Stuart: MMIX 31. 267, 271, 272, 273, 274, 276, 277, 286, 287, stop: MMIX-IO 4, MMIX-PIPE 381, 382, 383, 288, 291, 296, 297, 298, 300, 353, 354, 358, MMIX-SIM <u>114</u>, 115, 116. 359, 360, 361, 365, 366. store\_fx: MMIX-SIM 89. stat: MMIXAL 82  $store\_new\_char$ : MMIXAL 57. stat: mmix-sim 43, 44. store\_sf: MMIX-ARITH 40, MMIX-PIPE 21, 281, stat\_buf: mmix-sim 44. MMIX-SIM 13, 95. state: MMIX-PIPE 30, 31, 44, 46, 124, 125, store\_x: MMIX-SIM 84, 85, 86, 87, 88, 89, 90, 130, 131, 133, 134, 135, 215, 217, 219, 92, 94, 97, 102, 107. 222, 224, 232, 233, 234, 237, 257, 259, STOU: MMIX 8, MMIX-PIPE 47, 113, 339,  $260,\ 262,\ 264,\ 265,\ 267,\ 268,\ 270,\ 271,\ 272,$ MMIX-SIM <u>54</u>, 95, MMIXAL 63.  $273,\ 274,\ 276,\ 277,\ 278,\ 279,\ 280,\ 281,\ 288,$ STOUI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 95.  $291,\ 292,\ 295,\ 296,\ 297,\ 298,\ 300,\ 301,\ 310,$ 325, 326, 345, 351, 354, 358, 359, 360, 361, strcmp: MMIX-CONFIG 18, 19, 20, 21, 22, 23, 24, 38, mmixal 38, mmmix 4, mmotype 28.364, 368, mmix-sim 160. strcpy: MMIX-CONFIG 10, 18, 25, 38, MMIX $state\_4$ : MMIX-PIPE 308, 310, 311. SIM 96, 97, 98, 107, 108, 149, MMIXAL 137,  $state_{-}5$ : MMIX-PIPE 307,  $\underline{310}$ , 311. 138, MMOTYPE 28. status: MMIXAL 82, 87, 94, 98, 99, 100, stream\_name: MMIX-SIM 137. 101, 109, 110, 116, 117, 118, 121, 122, 123, 124, 129. string: MMIX-IO 19, 20, MMIX-SIM 4. STB: MMIX 8, MMIX-PIPE 47, 256, 281, strlen: MMIX-ARITH 67, MMIX-CONFIG 10, 25,

27, 38, MMIX-PIPE 387, MMIX-SIM 4, 24, 42,

MMIX-SIM 54, 95, 123, MMIXAL 63.

45, 120, 143, 149, 150, 163, MMIXAL 34, 50, sym\_ptr: mmixal 75, 77, 78, 79, 80, 138, MMMIX 4, 6, MMOTYPE 28. MMOTYPE 25, 26, 28, 29. strncmp: MMIX-ARITH 68, 79. sym\_root: mmixal 60. strncpy: MMOTYPE 28. sym\_tab\_struct: MMIXAL 54, 58. strong: MMIXAL 82, 83. Symbol table...: MMOTYPE  $2\overline{2}$ ,  $2\overline{5}$ . STSF: MMIX 26, MMIX-PIPE 47, 256, 281, symbol...already defined: MMIXAL 109. MMIX-SIM 54, 95, MMIXAL 63. symbol\_found: MMIXAL 87, 88, 89. STSFI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 95. SYNC: MMIX 31, MMIX-PIPE 47, 304, 323, STT: MMIX 8, MMIX-PIPE 47, 281, MMIX-MMIX-SIM 54, 107, MMIXAL 63. SIM 54, 95, MMIXAL 63. sync: mmix-config 28, mmix-pipe 49, 51, STTI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 95. 230, 233, 234, 251, 254, 256, 257, 355, STTU: MMIX 8, MMIX-PIPE 47, 281, MMIX-356, 361. SIM 54, 95, MMIXAL 63. sync\_check: mmix-pipe 269, 271, 370. STTUI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 95.  $sync_L$ : MMIX-SIM 101. STUNC: MMIX 30, MMIX-PIPE 47, 281, MMIX-SYNCD: MMIX 30, MMIX-PIPE 47, MMIX-SIM 54, SIM 54, 95, MMIXAL 63. 106, mmixal 63. stunc: MMIX-PIPE 49, 251, 254, 257, 281. syncd: MMIX-PIPE 49, 51, 230, 265, 269, 271, STUNCI: MMIX-PIPE 47, MMIX-SIM 54, 95. 280, 320, 323, 364, 368, 369. STW: MMIX 8, MMIX-PIPE 47, 281, MMIX-SYNCDI: MMIX-PIPE 47, MMIX-SIM 54, 106. SIM 54, 95, MMIXAL 63. SYNCID: MMIX 30, MMIX-PIPE 47, MMIX-STWI: MMIX-PIPE 47, MMIX-SIM 54, 95. SIM 54, 106, MMIXAL 63. STWU: MMIX 8, MMIX-PIPE 47, 281, MMIXsyncid: MMIX-PIPE 49, 51, 85, 119, 265, 266, SIM <u>54</u>, 95, MMIXAL 63. 267, 269, 270, 271, 272, 280, 320, 323. STWUI: MMIX-PIPE 47, MMIX-SIM 54, 95. SYNCIDI: MMIX-PIPE 47, MMIX-SIM 54, 106. style: MMIX-SIM 133, 134, 137. syntax error...: MMIXAL 86, 97. SUB: MMIX 9, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , syntax of floating point constants: MMIX-85, MMIXAL 63. ARITH 68. sub: MMIX-CONFIG 28, MMIX-PIPE 44, 49, sys\_call: MMIX-PIPE 371, MMIX-SIM 59. 51, 140. system dependencies: MMIX-ARITH 3, MMIX-SUBI: MMIX-PIPE 47, MMIX-SIM 54, 85. IO 16, MMIX-PIPE 17, 89, MMIX-SIM 10, 43, subnormal numbers: MMIX 21. 44, 77, MMIXAL 26, MMOTYPE 27. subroutine library initialization: MMIX-SIM 6, System/360: MMIX 7. 164. System/370: MMIX 31. SUBSUBVERSION: MMIX-PIPE 89, MMIX-SIM 77. sz: mmix-arith 24. SUBU: MMIX 9, MMIX-PIPE 47, MMIX-SIM 54, t: mmix-arith 8, 13, 39, 40, 89, 90, mmix-85, MMIXAL 63. PIPE <u>35</u>, <u>82</u>, <u>95</u>, <u>97</u>, <u>197</u>, <u>241</u>, MMIXsubu: MMIX-CONFIG 28, MMIX-PIPE 49, SIM <u>15</u>, <u>166</u>, <u>MMIXAL</u> <u>48</u>, <u>55</u>, <u>57</u>, <u>73</u>, <u>74</u>, 51, 139. MMOTYPE 8. SUBUI: MMIX-PIPE 47, MMIX-SIM 54, 85. tag: MMIX-CONFIG 32, 33, MMIX-PIPE 167 SUBVERSION: MMIX-PIPE 89, MMIX-SIM 77. 172, 176, 177, 179, 185, 193, 196, 197, 201, support: MMIX-PIPE 78, 79, 80. 203, 205, 206, 210, 213, 216, 217, 218, 219, suppress\_dispatch: MMIX-PIPE 64, 65, 317. 221, 223, 226, 233, 234, 245, 259, 276, 353, switchable\_string: MMIX-SIM 138, 139. 354, 378, 379, mmmix 12, 23. switch0: MMIX-PIPE  $\underline{288}$ , 299. tagmask: MMIX-CONFIG 31, MMIX-PIPE 167, switch1: MMIX-PIPE  $\overline{130}$ , 133, 265, 327, 192, 193, 205. 345, 359, 360. tail: MMIX-PIPE 64, 69, 71, 73, 74, 85, 120, switch2: MMIX-PIPE 135, 364. 160, 301, 304, 308, 309, 316, MMMIX 12, 22. SWYM: MMIX 49, MMIX-PIPE 47, 301, 321, 323, TC: MMIX 46. 325, mmix-sim 54, 107, mmixal 63. TDIF: MMIX 11, MMIX-PIPE 47, MMIX-SIM 54,  $swym\_one$ : MMIX-PIPE 301, 302. 87, MMIXAL 63. sy: MMIX-ARITH 24. $tdif: MMIX-CONFIG 28, MMIX-PIPE \underline{49}, 51, 344.$ sym: mmixal <u>54</u>, 64, 66, 70, 71, 72, 73, 74, tdif\_l: mmix-pipe 344, mmix-sim 87. 75, 76, 78, 87, 91, 100, 104, 110, 111, TDIFI: MMIX-PIPE 47, MMIX-SIM 54, 87. 118, 125, 130, 144. terabytes: MMIX 42, 45.  $sym\_avail$ : MMIXAL 59, <u>60</u>. terminate: MMIX-PIPE 125, 126, 144, 215, 217,  $sym_buf: MMIXAL 75, 77, 78, 79, 80,$ MMOTYPE 25, 26, 27, 28, 29. 221, 222, 224, 232, 237. sym\_length\_max: MMOTYPE 26, 29. terminator: mmixal <u>57</u>, 87. **sym\_node**: MMIXAL 58, 59, 60, 65, 74, 90, 109. ternary\_trie\_struct: MMIXAL 54.

trans\_key: MMIX-PIPE 240, 245, 267, 272, 291, test\_load\_bkpt: MMIX-SIM 83, 94, 96, 105, 111, 114. 298, 302, 326, 353, 354. translation caches: MMIX 46, 47, 49, MMIXtest\_overflow: MMIX-SIM 88. test\_store\_bkpt: MMIX-SIM 82, 95, 96, 103, 117. PIPE 163. tet: MMIX-SIM 16, 25, 26, 28, 30, 33, 34, 37, TRAP: MMIX 33, 36, 50, MMIX-PIPE 47, 80, 82, 51, 63, 82, 83, 94, 95, 96, 103, 105, 111, 320, mmix-sim <u>54</u>, 108, mmixal 63. 114, 118, 119, 157, 159, 163, 164, 165, trap: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 80, MMOTYPE 9, 11, 15, 18, 19, 21, 23, 24. 81, 82, 85, 103, 149, 310, 312, 313, 317, 320. TETRA: MMIXAL  $\overline{17}$ ,  $\underline{62}$ , 63, 117. trap\_format: MMIX-SIM 108, 110, 139. **tetra**: MMIX-ARITH  $\overline{3}$ , 7, 8, 13, 26, 27, 28,  $trap\_loc$ : MMIX-PIPE <u>373</u>. 29, 34, 38, 39, 40, 54, 59, 60, 61, 62, 82, traps: MMIX 35. 90, mmix-io 3, mmix-pipe 17, 21, 68, 73, trie\_node: MMIXAL 54, 55, 56, 57, 65, 73,  $76, 78, 91, 1\overline{20}, 206, 210, 2\overline{13}, 246, 255,$ 74, 82, 90. MMIX-SIM 10, 13, 15, 16, 19, 25, 31, 44, 61, trie\_root: MMIXAL 56, 61, 66, 70, 71, 80, 62, 95, 101, 114, 164, 165, 166, MMIXAL 26, 87, 111, 144.  $43,\ 48,\ 52,\ 68,\ 76,\ 105,\ 120,\ \text{mmmix}\ 15,$ trie\_search: MMIXAL 57, 64, 66, 70, 71, 87, 20, mmotype 7, 8, 11. 104, 111, 144. tetrabyte: MMIX 6. TRIP: MMIX 33, 35, 50, MMIX-PIPE 47, Text\_Segment: MMIX-SIM 3. MMIX-SIM 54, 108, 123, MMIXAL 63. TextRead: MMIX-SIM 4, MMIXAL 69. trip: MMIX-CONFIG 28, MMIX-PIPE 49, 51, TextWrite: MMIX-SIM 4, MMIXAL 69. 80, 85, 312, 313, 317. The number of local...: MMIX-SIM 77. trip\_warning: MMIX-IO 23, 24. the operand is undefined: MMIXAL 109, tripping: MMIX-SIM 61, 123, 131. trips: MMIX 35. The symbol table isn't...: MMOTYPE 30. true: MMIX-ARITH 1, 24, 68, MMIX-CONFIG 15, thinking big: MMIX-PIPE 58, 74. 22, 24, mmix-pipe 11, 59, 68, 85, 89, 100, third\_operand: MMIX-PIPE 103, 107, 108,  $106, 108, 110, 112, \overline{113}, 114, 117, 118, 119,$ MMIX-SIM <u>64</u>, 71, 79. 120, 121, 144, 146, 170, 185, 217, 227, 236, This can't happen: MMIX-PIPE 13. 238, 239, 259, 262, 263, 265, 302, 304, 310, three\_arg\_bit: mmixal 62, 116. 312, 314, 316, 317, 322, 324, 330, 331, 332, thresh: MMIX-ARITH 93, 94. 333, 334, 337, 338, 339, 340, 345, 350, 355, ticks: MMIX-MEM 2, 3, MMIX-PIPE 10, 14, 28, 361, 364, 373, MMIX-SIM 9, 45, 51, 63, 82, 83, 87, 90, 93, 103, 105, 107, 109, 122, 123, 64, 87, 187, 251, 256, 257, MMMIX 2, 15, 23. time: MMIX-PIPE 89, MMIX-SIM 77, 125, 127, 128, 141, 142, 143, 148, 149, 150, mmixal 141. 153, 164, mmixal <u>26,</u> 35, 41, 71, 87, 111, 132, mmmix 6, 7, 8, 9, 10, 11. times: mmixal <u>82</u>, 97, 101. true\_head: MMIX-PIPE 74, 81. tininess: MMIX-ARITH 31. TLB: MMIX-PIPE 163. try\_complement: MMIX-ARITH 94, 95. tmp: MMIX-SIM 31, 34, MMMIX 16, 18, 22, trying\_to\_interrupt: MMIX-PIPE 314, 315, 330, MMOTYPE <u>16</u>, 19, 24. 351, 363, 364. tt: mmix-arith 65, 66, 81, 83, mmix-pipe 28, tmpo: MMIX-PIPE 141. MMIXAL <u>57</u>, 64, <u>65</u>, 66, 70, 87, 88, 89, 111. token: MMIX-CONFIG 9, 10, 11, 18, 19, 20, Two file names...: MMOTYPE 20. 21, 22, 23, 24, 25. token\_prescanned: MMIX-CONFIG 9, 10, 22, 24. two\_arg\_bit: mmixal 62, 116. Tomasulo, Robert Marco: MMIX-PIPE 58. Type tetra...: MMIXAL 29. too many global registers: MMIXAL 108. t0: MMMIX  $\underline{10}$ . too many operands...: MMIXAL 116. t1: MMMIX 10.*top\_op*: MMIXAL <u>83</u>, 85. t2: MMMIX 10.top\_val: MMIXAL 83, 87, 94, 98, 99, 100, 101. t3: MMMIX 10.trace\_bit: MMIX-SIM <u>58</u>, 63, 161, 162. v: MMIX 50, MMIX-SIM 1. trace\_format: MMIX-SIM 64, 65, 131. u: MMIX-ARITH  $\underline{7}$ ,  $\underline{8}$ ,  $\underline{13}$ ,  $\underline{89}$ , MMIX-MEM  $\underline{1}$ , trace\_print: MMIX-SIM 136, 137. MMIX-PIPE 21, 75, 79, 97, MMIX-SIM 13, trace\_threshold: MMIX-SIM 61, 63, 143. MMIXAL 28.  $U_BIT: MMIX-ARITH 31, 33, 35, MMIX-PIPE 54,$ tracing: MMIX-SIM 61, 63, 82, 83, 93, 103, 105, 307, mmix-sim  $\underline{57}$ , 89, 122, mmixal 69.  $107,\ 122,\ 127,\ 128,\ 149.$  $tracing\_exceptions$ : MMIX-SIM <u>61</u>, 122, 143. U\_Handler: MMIXAL 69. trailing characters...: MMIXAL 35. unary: MMIXAL <u>82</u>, 83. trans: MMIX-PIPE 241. unary\_check: mmixal 100.

undefined: MMIXAL 82, 87, 99, 101, 109, 110, vanish: MMIX-CONFIG 34, MMIX-PIPE 126, 117, 118, 121, 122, 123, 124, 129. 128, 129, 260. vanish\_ctl: MMIX-PIPE 127, 128. undefined constant: MMIXAL 118. vctsz: mmix-config 13, 15, 23. undefined local symbol: MMIXAL 145. verb: mmixal 100, 101. undefined symbol: MMIXAL 79. underflow: MMIX 21, 22, 32, MMIX-ARITH 31. verbose: MMIX-MEM 2, 3, MMIX-PIPE 4, 10, 28, 33, 46, 81, 125, 145, 146, 147, 149, undump\_octa: MMMIX 9, 10, 11. 152, 160, 177, 210, 283, 310, 314, 319, Unexpected end of file...: MMMIX 11, 320, 321, MMIX-SIM 140, MMMIX 15, mmotype 9. MMOTYPE 2, 4, 9, 30. Unicode: MMIX 6, MMIXAL 5, 6, 7, 30, 75, VERSION: MMIX-PIPE 89, MMIX-SIM 77. MMOTYPE 27. version number: MMIX 41, 51. uninit\_mem\_bit: MMIX-PIPE 8, 210. vh: MMIX-ARITH <u>13</u>, 17, 21. uninitialized memory...: MMIX-PIPE 210. victim: MMIX-CONFIG 33, MMIX-PIPE 167, unit\_busy: MMIX-PIPE 82. 177, 181, 193, 196, 199, 205, 233, 234. unit\_found: MMIX-PIPE 82. VIIIADDU: MMIX-PIPE 47, MMIX-SIM 54, 85. Unknown lopcode: MMOTYPE 13. VIIIADDUI: MMIX-PIPE 47, MMIX-SIM 54, 85. unknown operation code: MMIXAL 104. virt: MMIX-PIPE 241.UNKNOWN\_SPEC: MMIX-PIPE 71, 73, 85, 120, virtual address emulation: MMIX 49. 123, 290, 309. virtual addresses: MMIX 44, 45, 47. unsav: MMIX-PIPE 49, 327, 332. vmh: mmix-arith 13, 17, 21. UNSAVE: MMIX 43, 50, MMIX-PIPE 47, 81, 102, 279, 332, 335, mmix-sim <u>54</u>, 104, 164, vrepl: MMIX-CONFIG 16, 23, MMIX-PIPE 167, 196, 199, 205. MMIXAL 63, MMMIX 12. Vuillemin, Jean Etienne: MMIX-SIM 16. unsave: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 327, 332. vv: MMIX-CONFIG 16, 23, 31, 33, MMIXunschedule: MMIX-PIPE <u>32</u>, <u>33</u>, 145, 287. PIPE 167, 177, 181, 193, 196, 199, 205, 233, 234. unsgnd: MMIX-PIPE  $\underline{21}$ , MMIX-SIM  $\underline{13}$ . Unsupported virtual address: MMMIX 11.  $w: \text{MMIX-ARITH } \underline{8}, \text{MMIX-SIM } \underline{61}.$ W\_BIT: MMIX-ARITH 31, 88, MMIX-PIPE 54, up: MMIX-PIPE 40, 73, 85, 86, 89, 93, 95, 97, 100, 102, 114, 116, 117, 120, 146, 227, 346, mmix-sim <u>57</u>, 89, mmixal 69. W\_Handler: MMIXAL 69. 254, 255, 312, 333, 334. update\_listing\_loc: mmixal 42, 44. wait: MMIX-PIPE 125, 131, 133, 134, 215, 216, 217, 218, 219, 221, 222, 223, 224, 225, usage: MMIX-PIPE 44, 46, 81, 100, 146, 324, 233, 234, 237, 257, 259, 260, 261, 262, MMIX-SIM <u>143</u>. 263, 264, 266, 271, 272, 273, 276, 277, Usage: ...: MMIX-SIM 143, MMIXAL 137, 278, 279, 281, 283, 288, 290, 297, 298, 301, MMMIX 3, MMOTYPE 2. 310, 326, 328, 329, 330, 342, 350, 351, 353, usage\_help: MMIX-SIM 143, 144. 354, 356, 357, 358, 359, 360, 361, 362, 363, use\_and\_fix: MMIX-PIPE <u>195</u>, <u>196</u>, 198, 201, 364, 365, 366, 367, 368. 217, 262, 268, 270, 271, 272, 273, 292, wait\_or\_pass: MMIX-PIPE 288, 292, 295, 296.  $293,\ 296,\ 353,\ 354.$ Waldspurger, Carl Alan: MMIX 40. useful: MMIXAL 73. wbuf\_bot: MMIX-CONFIG 37, MMIX-PIPE 247, v: MMIX-ARITH 8, 13, MMIX-CONFIG 11, 12,  $251,\ 255,\ 256,\ 257,\ 378,\ 379.$ 13, 14, MMIX-PIPE 167. wbuf\_lock: MMIX-PIPE 39, 247, 256, 257, 259, **V\_BIT**: MMIX-ARITH 31, MMIX-PIPE 54, 140, 260, 262, 264, 360. 141, 282, 343, MMIX-SIM 57, 84, 85, 87, wbuf\_top: MMIX-CONFIG 37, MMIX-PIPE 247, 88, 95, mmixal 69. 249, 251, 255, 256, 257, 378, 379. V\_Handler: MMIXAL 69. wcslen: mmix-sim 4. val: mmix-arith 68, 69, 71, 73, 83, 84, WDIF: MMIX 11, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , MMIX-MEM 2, 3, MMIX-PIPE 208, 212. 87, mmixal 63. 213, 379, mmix-sim 13, 30, 153, 155, 157, wdif: mmix-config 28, mmix-pipe 49, 158, 161, MMMIX 17. val\_node: MMIXAL 82, 83, 84. 51, 344. WDIFI: MMIX-PIPE 47, MMIX-SIM 54, 87. val\_ptr: MMIXAL <u>83</u>, 85, 87, 94, 99, 110, weak: mmixal 82, 83. 116, 117. Weihl, William Edward: MMIX 40. val\_stack: MMIXAL 81, 82, 83, 84, 85, 108, 109, 110, 116, 117, 118, 121, 122, 123, 124, 125, what\_say:  $MMIX-SIM \underline{149}$ , 152,  $MMMIX \underline{13}$ , 126, 127, 129, 130, 131, 132, 134. 15, 18, 19. Vandevoorde, Mark Thierry: MMIX 40. Wheeler, David John: MMIX-ARITH 26.

Wilkes, Maurice Vincent: MMIX 30, MMIX-XVIADDUI: MMIX-PIPE 47, MMIX-SIM 54, 85. ARITH 26. xx: MMIX-ARITH 26, MMIX-PIPE 44, 46, 100, Wirth, Niklaus Emil: MMIX 7. 102, 106, 110, 114, 117, 118, 119, 120, wow: MMIX-PIPE 11. 146, 227, 265, 275, 312, 320, 323, 325, 329, 332, 335, 336, 337, 340, 341, 364, 369, 370, wra: MMIX-CONFIG 13, 15, 23. MMIX-SIM 60, 62, 74, 80, 95, 97, 101, 102, wrb: MMIX-CONFIG 13, 15, 23. 104, 106, 107, 108, 124. WRITE\_ALLOC: MMIX-CONFIG 23, 31, MMIXxyz: mmixal 119, 120, 123, 129, 130, 131. PIPE <u>166</u>, 167, 217, 257. XYZ field doesn't fit...: MMIXAL 129. WRITE\_BACK: MMIX-CONFIG 23, MMIX-PIPE <u>166</u>, 167, 217, 263. xyzar\_bit: mmixal 62, 129. write\_bit: MMIX-SIM <u>58</u>, 82, 161, 162.  $xyzr\_bit$ : MMIXAL 62, 129. write\_buf\_size: MMIX-CONFIG 15, 37. y: mmix-arith 5, 6, 7, 8, 12, 13, 24, 25,  $write\_co: MMIX-PIPE \underline{248}, 249.$ 27, 28, 29, 41, 44, 46, 50, 85, 93, MMIX-MEM  $\underline{1}$ , MMIX-PIPE  $\underline{21}$ ,  $\underline{44}$ , MMIX-SIM  $\underline{13}$ , write\_ctl: MMIX-PIPE 248, 249, 360. <u>61</u>, mmixal <u>28</u>, <u>48</u>, <u>120</u>, mmmix <u>20</u>, <u>25</u>, write\_from\_wbuf: MMIX-PIPE 129, 249, 257,  $\overline{\text{MMOTYPE } 18}$ . Y field doesn't fit...: MMIXAL 122. write\_head: MMIX-PIPE 247, 249, 251, 255, 256, Y field is undefined: MMIXAL 122. 257, 259, 260, 261, 262, 360, 362, 378, 379. Y field of lop\_post...: MMOTYPE 22. write\_node: MMIX-CONFIG 37, MMIX-PIPE 246, 247, 251, 255, 256, 378, 379. Y field...register number: MMIXAL 122. write\_restart: MMIX-PIPE 257, 261.  $Y_is\_immed\_bit$ : MMIX-SIM 65. write\_search: MMIX-PIPE 254, 255, 268, 270,  $Y_{is\_source\_bit}$ : MMIX-SIM 65. yar\_bit: mmixal 62, 122. 271, 278. ybyte: MMIX-SIM 28, 29, <u>33</u>, 34, 35. write\_tail: MMIX-PIPE 247, 249, 251, 255, 256, ye: MMIX-ARITH 41, 43, 44, 45, 46, 47, 48, 257, 360, 362, 378, 379. <u>50</u>, 51, 52, <u>85</u>, <u>93</u>, 94, 95. WYDE: MMIXAL 17, 62, 63, 117. wyde: MMIX 6. Yellin, Frank Nathan: MMIX 40. yf: mmix-arith 41, 43, 44, 45, 46, 47, 48, 49, wyde\_diff: MMIX-ARITH 28, MMIX-PIPE 21, 344, MMIX-SIM 13, 87. <u>50</u>, 51, 52, 53, <u>85</u>, <u>93</u>, 94, 95. yhl: MMIX-ARITH  $\overline{7}$ , MMMIX  $\underline{20}$ . ylh: MMIX-ARITH  $\overline{7}$ , MMMIX  $\underline{20}$ . x: mmix-arith <u>5</u>, <u>6</u>, <u>13</u>, <u>25</u>, <u>26</u>, <u>27</u>, <u>29</u>, <u>37</u>, <u>38</u>, 39, 40, 41, 44, 46, 54, 60, 62, 81, 82, 85, 91, 93, MMIX-IO 22, MMIX-PIPE 21, 44, 56, ynp: MMIX-PIPE 241.  $\underline{119}, \, \underline{120}, \, \underline{381}, \, \underline{384}, \, \text{mmix-sim} \, \, \underline{13}, \, \underline{61}, \, \underline{114},$  $yr\_bit$ : MMIXAL <u>62</u>, 122. MMIXAL 28, 48, 76, 120, MMOTYPE 8. ys: mmix-arith <u>41</u>, <u>44</u>, <u>46</u>, 47, 48, 49, <u>50</u>, X field doesn't fit...: MMIXAL 123. 53, <u>85</u>, <u>93</u>, 94. X field is undefined:  ${\tt MMIXAL}\ 123.$ yt: MMIX-ARITH 41, 44, 46, 50, 52, 85, 93. X field...register number: MMIXAL 123. yy: MMIX-ARITH 24, MMIX-PIPE 44, 46, 100, X\_BIT: MMIX-ARITH 31, 33, 35, MMIX-PIPE 54, 103, 105, 118, 320, 333, 335, 337, 339, 341, 307, MMIX-SIM 57, 122, MMIXAL 69. 372, 380, MMIX-SIM 60, 62, 71, 73, 97, 102,  $x\_bits$ : MMIXAL 52. 104, 107, 108, 111, 124. yz: MMIX-PIPE 75, 84, 85, 109, 120, MMIX-X\_Handler: MMIXAL 69.  $X_{is}_{dest_{bit}}$ : MMIX-PIPE 83, 101, 312, 320, SIM 60, 62, 70, 78, 101, MMIXAL 48, MMIX-SIM 60, <u>65</u>, 126. 120, 122, 123, 124, 125, 126, 127, 128,  $X_{is\_source\_bit}$ : MMIX-SIM <u>65</u>. MMOTYPE 9, <u>11</u>, 13, 18, 19, 20, 21, 25, 30. YZ field at lop\_end...: MMOTYPE 30.  $x_{-}ptr$ : MMIX-SIM 61, 80, 84. YZ field doesn't fit...: MMIXAL 124. xar\_bit: mmixal 62, 123. YZ field is undefined:  ${\tt MMIXAL}\ 124.$ xe: MMIX-ARITH 41, 43, 44, 45, 46, 47, 49, YZ field of lop\_fixrx...: MMOTYPE 19. <u>91</u>, 92, <u>93</u>, 94. xf: MMIX-ARITH 41, 43, 44, 45, 46, 47, 86, YZ field...register number: MMIXAL 124. 87, <u>91</u>, 92, <u>93</u>, 94. YZ field...should be zero: MMOTYPE 25. XOR: MMIX 10, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , YZ field...should be 1: MMOTYPE 13. 86, mmixal 63. yzar\_bit: mmixal 62, 124. xor: mmix-arith 29, mmix-config 28, yzbytes: MMIX-SIM <u>25</u>, 26, 29, 33, 34, 35, 36. MMIX-PIPE  $\underline{21}$ ,  $\underline{49}$ , 51, 138, MMIX-SIM  $\underline{13}$ , yzr\_bit: mmixal <u>62</u>, 124. MMIXAL <u>82</u>, <u>97</u>, <u>101</u>. z: mmix-arith <u>5</u>, <u>8</u>, <u>12</u>, <u>13</u>, <u>24</u>, <u>25</u>, <u>27</u>, <u>28</u>, <u>29</u>, **XORI**: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 86. 39, 40, 41, 44, 46, 50, 85, 86, 88, 89, 91,  $xr\_bit$ : MMIXAL 62, 123. 93, MMIX-PIPE 21, 44, MMIX-SIM 13, 61,

MMIXAL <u>28</u>, <u>48</u>, <u>120</u>, MMOTYPE <u>18</u>.

Z field doesn't fit...: MMIXAL 121.

xs: MMIX-ARITH 41, 43, 44, 45, 46, 47, 93, 94.

XVIADDU: MMIX-PIPE 47, MMIX-SIM 54, 85.

Z field is undefined: MMIXAL 121. Z field of lop\_fixo...: MMOTYPE 19. Z field of lop\_loc...: MMOTYPE 18. Z field of lop\_post...: MMOTYPE 22. Z field...register number: MMIXAL 121. Z\_BIT: MMIX-ARITH 31, 44, MMIX-PIPE 54, MMIX-SIM 57, MMIXAL 69. Z\_Handler: MMIXAL 69.  $Z\_is\_immed\_bit$ : MMIX-SIM  $\underline{65}$ .  $Z\_is\_source\_bit$ : MMIX-SIM  $\underline{65}$ . zap\_cache: MMIX-PIPE 180, 181, 358, 359, 360.  $zar\_bit$ : MMIXAL <u>62</u>, 121. zbyte: MMIX-SIM 28, 29, 33, 34, 35, 37.ze: MMIX-ARITH 41, 43, 44, 45, 46, 47, 48, 50, 51, 52, <u>85, 86, 87, 88, 91, 92, 93, 94, 95.</u> zero: mmixal 82, 83. zero\_exponent: MMIX-ARITH 36, 37, 38, 51. zero\_octa: MMIX-ARITH 4, 24, 29, 31, 39, 41, 44, 45, 46, 53, 73, 83, 88, 89, 93, MMIX-IO 4. 8, 11, 14, 16, 18, 19, 20, 21, MMIX-PIPE 20, 100, 112, 179, 237, 243, 244, 265, 271, 279, 288, 312, 317, 330, 346, 356, 364, 380, MMIX-SIM 13, 60, 81, 89, 99, 126, 153, 154, 155, 158, 159, mmixal 27, 59, 100, 101, 116, mmmix 12, 21, 23, 25. zero\_out: MMIX-ARITH 93, 95. zero\_spec: MMIX-PIPE 41, 85, 100, 109, 112, 113, 114. zeros: MMIX-ARITH 74, 76, 77, 79. zf: MMIX-ARITH 41, 43, 44, 45, 46, 47, 48, 49, <u>50</u>, 51, 52, 53, <u>85</u>, <u>86</u>, 87, <u>88</u>, <u>91</u>, 92, 93, 94, 95. zhex: MMIX-SIM 134, 135, 137.  $zr\_bit$ : MMIXAL <u>62</u>, 121. zro: MMIX-ARITH 36, 37, 38, 39, 40, 41, 42, $44, 46, 50, 52, \overline{85}, 86, 88, 91, 93.$ zs: MMIX-ARITH 41, 44, 46, 47, 48, 49, 50. 53, 85, 86, 87, 88, 91, 93. zset: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 345. ZSEV: MMIX 16, MMIX-PIPE 47, MMIX-SIM 54, 92, mmixal 63. ZSEVI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 92. ZSN: MMIX 16, MMIX-PIPE 47, MMIX-SIM 54, 92, mmixal 63. ZSNI: MMIX-PIPE 47, MMIX-SIM 54, 92. ZSNN: MMIX 16, MMIX-PIPE 47, MMIX-SIM 54, 92, mmixal 63. ZSNNI: MMIX-PIPE 47, MMIX-SIM 54, 92. ZSNP: MMIX 16, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 92, mmixal 63. ZSNPI: MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 92. ZSNZ: MMIX 16, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 92, mmixal 63.  ${\tt ZSNZI:} \quad {\tt MMIX-PIPE} \ \underline{47}, \ {\tt MMIX-SIM} \ \underline{54}, \ 92.$ ZSOD: MMIX 16, MMIX-PIPE  $\underline{47}$ , MMIX-SIM  $\underline{54}$ , 92, mmixal 63. ZSODI: MMIX-PIPE 47, MMIX-SIM 54, 92. ZSP: MMIX 16, MMIX-PIPE 47, MMIX-SIM 54,

92, mmixal 63.

16ADDU: MMIX 9, MMIXAL 63. 2ADDU: MMIX 9, MMIXAL 63. 4ADDU: MMIX 9, MMIXAL 63. 8ADDU: MMIX 9, MMIXAL 63.

 $108, 109, 124, \overline{13}3, 138.$